如何使用 itext7 从 vb.net 中的 pdf 中提取矩形?

How can I extract rectangles using itext7 from pdf in vb.net?

我想从 pdf 中提取矩形及其位置和填充颜色。如果这里有人对在 vb.net

中使用 itext7 从 pdf 中提取矩形形状有一些想法,请帮助我

正如评论中所说,OP 感兴趣的形式是矢量图形,但不仅是矩形,而且基本上是任意形状。因此,这个答案演示了如何使用 vb.net.

提取矢量图形路径及其使用(stroke/fill/...)

为了从 PDF 中提取数据,iText 7 提供了一个框架,它遵循 PDF 内容流中的指令并相应地触发事件。因此,要提取路径,您必须首先实现一个事件侦听器(实现 iText IEventListener 接口)。此实现然后需要 select 仅需要的事件(使用 EventType.RENDER_PATH)并从给定的 PathRenderInfo 事件数据对象中提取所需的信息。

以下示例事件侦听器 class 只是将路径信息打印到控制台:

Public Class PathListener
    Implements IEventListener

    Public Sub EventOccurred(data As IEventData, type As EventType) Implements IEventListener.EventOccurred
        If type = EventType.RENDER_PATH Then
            Dim PathRenderInfo As PathRenderInfo = CType(data, PathRenderInfo)
            Dim OperationData = GetOperationData(PathRenderInfo)
            Dim PathData = GetPathData(PathRenderInfo)
            Console.WriteLine("{1} - {0}", OperationData, PathData)
        End If
    End Sub

    Public Function GetSupportedEvents() As ICollection(Of EventType) Implements IEventListener.GetSupportedEvents
        Return Nothing
    End Function

    Function GetOperationData(PathRenderInfo As PathRenderInfo) As String
        Dim OperationBuilder As New StringBuilder
        If PathRenderInfo.GetOperation = PathRenderInfo.NO_OP Then
            OperationBuilder.Append("Invisible")
        End If
        If (PathRenderInfo.GetOperation And PathRenderInfo.STROKE) = PathRenderInfo.STROKE Then
            OperationBuilder.Append("Stroked with ").Append(GetColorData(PathRenderInfo.GetStrokeColor))
            If (PathRenderInfo.GetOperation And PathRenderInfo.FILL) = PathRenderInfo.FILL Then
                OperationBuilder.Append(" and ")
            End If
        End If
        If (PathRenderInfo.GetOperation And PathRenderInfo.FILL) = PathRenderInfo.FILL Then
            OperationBuilder.Append("Filled with ").Append(GetColorData(PathRenderInfo.GetFillColor))
        End If
        If (PathRenderInfo.IsPathModifiesClippingPath) Then
            OperationBuilder.Append(", clipping")
        End If
        Return OperationBuilder.ToString
    End Function

    Function GetColorData(Color As Color) As String
        Dim ColorBuilder As New StringBuilder
        If TypeOf Color Is CalGray Then
            ColorBuilder.Append("CalGray")
        ElseIf TypeOf Color Is CalRgb Then
            ColorBuilder.Append("CalRGB")
        ElseIf TypeOf Color Is DeviceCmyk Then
            ColorBuilder.Append("DeviceCmyk")
        ElseIf TypeOf Color Is DeviceGray Then
            ColorBuilder.Append("DeviceGray")
        ElseIf TypeOf Color Is DeviceN Then
            ColorBuilder.Append("DeviceN")
        ElseIf TypeOf Color Is DeviceRgb Then
            ColorBuilder.Append("DeviceRgb")
        ElseIf TypeOf Color Is IccBased Then
            ColorBuilder.Append("IccBased")
        ElseIf TypeOf Color Is Indexed Then
            ColorBuilder.Append("Indexed")
        ElseIf TypeOf Color Is Lab Then
            ColorBuilder.Append("Lab")
        ElseIf TypeOf Color Is PatternColor Then
            Return "PatternColor(special)"
        ElseIf TypeOf Color Is Separation Then
            ColorBuilder.Append("Separation")
        End If
        ColorBuilder.Append("(").Append(String.Join(", ", Color.GetColorValue)).Append(")")
        Return ColorBuilder.ToString
    End Function

    Function GetPathData(PathRenderInfo As PathRenderInfo) As String
        Dim CurrentTransformation = PathRenderInfo.GetCtm
        Dim PathBuilder As New StringBuilder
        Dim FirstSubPath = True
        For Each SubPath In PathRenderInfo.GetPath.GetSubpaths
            If FirstSubPath Then
                FirstSubPath = False
                PathBuilder.Append("Path ")
            ElseIf Not (SubPath.IsEmpty Or SubPath.GetSegments.Count = 0) Then
                PathBuilder.Append(" and ")
            End If
            Dim FirstShape = True
            For Each Shape In SubPath.GetSegments
                If FirstShape Then
                    FirstShape = False
                    PathBuilder.Append("from ").Append(GetPointData(Shape.GetBasePoints.First, CurrentTransformation))
                Else
                    PathBuilder.Append(",")
                End If
                If TypeOf Shape Is Line Then
                    PathBuilder.Append(" line to ").Append(GetPointData(Shape.GetBasePoints.Last, CurrentTransformation))
                ElseIf TypeOf Shape Is BezierCurve Then
                    PathBuilder.Append(" curve via ").Append(GetPointData(Shape.GetBasePoints(1), CurrentTransformation))
                    PathBuilder.Append(" and ").Append(GetPointData(Shape.GetBasePoints(2), CurrentTransformation))
                    PathBuilder.Append(" to ").Append(GetPointData(Shape.GetBasePoints(3), CurrentTransformation))
                End If
            Next
            If SubPath.IsClosed Then
                PathBuilder.Append(" (closed)")
            End If
        Next
        Return PathBuilder.ToString
    End Function

    Function GetPointData(Point As Point, CurrentTransformation As Matrix) As String
        Dim Transformed = CurrentTransformation.Multiply(New Matrix(Point.GetX, Point.GetY))
        Return String.Format(CultureInfo.InvariantCulture, "({0}, {1})", Transformed.Get(Matrix.I31), Transformed.Get(Matrix.I32))
    End Function
End Class

使用此事件侦听器,您可以检查文档的页面:

Using PdfDocument As New PdfDocument(New PdfReader(...))
    Dim PathListener As New PathListener
    Dim PdfCanvasProcessor As New PdfCanvasProcessor(PathListener)
    For page As Integer = 1 To PdfDocument.GetNumberOfPages
        PdfCanvasProcessor.ProcessPageContent(PdfDocument.GetPage(page))
    Next
End Using

输出可能如下所示:

Path from (51.2, 723.57) line to (512.17, 723.57), line to (512.17, 736.97), line to (51.2, 736.97) (closed) - Invisible, clipping
Path from (108.6, 516.6) curve via (108.6, 569.29) and (160.18, 612) to (223.8, 612), curve via (287.42, 612) and (339, 569.29) to (339, 516.6), curve via (339, 463.91) and (287.42, 421.2) to (223.8, 421.2), curve via (160.18, 421.2) and (108.6, 463.91) to (108.6, 516.6) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (174.89, 545.13) curve via (174.89, 550.62) and (180.27, 555.07) to (186.89, 555.07), curve via (193.52, 555.07) and (198.89, 550.62) to (198.89, 545.13), curve via (198.89, 539.64) and (193.52, 535.19) to (186.89, 535.19), curve via (180.27, 535.19) and (174.89, 539.64) to (174.89, 545.13) (closed) and from (248.71, 545.13) curve via (248.71, 550.62) and (254.08, 555.07) to (260.71, 555.07), curve via (267.33, 555.07) and (272.71, 550.62) to (272.71, 545.13), curve via (272.71, 539.64) and (267.33, 535.19) to (260.71, 535.19), curve via (254.08, 535.19) and (248.71, 539.64) to (248.71, 545.13) (closed) - Filled with DeviceRgb(0.251, 0.408, 0.596)
Path from (174.89, 545.13) curve via (174.89, 550.62) and (180.27, 555.07) to (186.89, 555.07), curve via (193.52, 555.07) and (198.89, 550.62) to (198.89, 545.13), curve via (198.89, 539.64) and (193.52, 535.19) to (186.89, 535.19), curve via (180.27, 535.19) and (174.89, 539.64) to (174.89, 545.13) (closed) and from (248.71, 545.13) curve via (248.71, 550.62) and (254.08, 555.07) to (260.71, 555.07), curve via (267.33, 555.07) and (272.71, 550.62) to (272.71, 545.13), curve via (272.71, 539.64) and (267.33, 535.19) to (260.71, 535.19), curve via (254.08, 535.19) and (248.71, 539.64) to (248.71, 545.13) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (161.36, 475) curve via (202.99, 451.32) and (244.56, 451.32) to (286.09, 475) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (108.6, 516.6) curve via (108.6, 569.29) and (160.18, 612) to (223.8, 612), curve via (287.42, 612) and (339, 569.29) to (339, 516.6), curve via (339, 463.91) and (287.42, 421.2) to (223.8, 421.2), curve via (160.18, 421.2) and (108.6, 463.91) to (108.6, 516.6) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (51.2, 565.15) line to (512.17, 565.15), line to (512.17, 578.55), line to (51.2, 578.55) (closed) - Invisible, clipping
Path from (147.8, 556.2) curve via (147.8, 608.89) and (199.38, 651.6) to (263, 651.6), curve via (326.62, 651.6) and (378.2, 608.89) to (378.2, 556.2), curve via (378.2, 503.51) and (326.62, 460.8) to (263, 460.8), curve via (199.38, 460.8) and (147.8, 503.51) to (147.8, 556.2) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (214.09, 584.73) curve via (214.09, 590.22) and (219.47, 594.67) to (226.09, 594.67), curve via (232.72, 594.67) and (238.09, 590.22) to (238.09, 584.73), curve via (238.09, 579.24) and (232.72, 574.79) to (226.09, 574.79), curve via (219.47, 574.79) and (214.09, 579.24) to (214.09, 584.73) (closed) and from (287.91, 584.73) curve via (287.91, 590.22) and (293.28, 594.67) to (299.91, 594.67), curve via (306.53, 594.67) and (311.91, 590.22) to (311.91, 584.73), curve via (311.91, 579.24) and (306.53, 574.79) to (299.91, 574.79), curve via (293.28, 574.79) and (287.91, 579.24) to (287.91, 584.73) (closed) - Filled with DeviceRgb(0.251, 0.408, 0.596)
Path from (214.09, 584.73) curve via (214.09, 590.22) and (219.47, 594.67) to (226.09, 594.67), curve via (232.72, 594.67) and (238.09, 590.22) to (238.09, 584.73), curve via (238.09, 579.24) and (232.72, 574.79) to (226.09, 574.79), curve via (219.47, 574.79) and (214.09, 579.24) to (214.09, 584.73) (closed) and from (287.91, 584.73) curve via (287.91, 590.22) and (293.28, 594.67) to (299.91, 594.67), curve via (306.53, 594.67) and (311.91, 590.22) to (311.91, 584.73), curve via (311.91, 579.24) and (306.53, 574.79) to (299.91, 574.79), curve via (293.28, 574.79) and (287.91, 579.24) to (287.91, 584.73) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (200.56, 514.6) curve via (242.19, 490.92) and (283.76, 490.92) to (325.29, 514.6) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (147.8, 556.2) curve via (147.8, 608.89) and (199.38, 651.6) to (263, 651.6), curve via (326.62, 651.6) and (378.2, 608.89) to (378.2, 556.2), curve via (378.2, 503.51) and (326.62, 460.8) to (263, 460.8), curve via (199.38, 460.8) and (147.8, 503.51) to (147.8, 556.2) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (103, 398.43) line to (487, 398.43), line to (487, 605.83), line to (103, 605.83) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (103, 398.43) line to (487, 398.43), line to (487, 605.83), line to (103, 605.83) (closed) - Stroked with DeviceGray(0.525)
Path from (229.4, 344.2) curve via (229.4, 363.31) and (246.59, 378.8) to (267.8, 378.8), curve via (289.01, 378.8) and (306.2, 363.31) to (306.2, 344.2), curve via (306.2, 325.09) and (289.01, 309.6) to (267.8, 309.6), curve via (246.59, 309.6) and (229.4, 325.09) to (229.4, 344.2) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (229.4, 344.2) curve via (229.4, 363.31) and (246.59, 378.8) to (267.8, 378.8), curve via (289.01, 378.8) and (306.2, 363.31) to (306.2, 344.2), curve via (306.2, 325.09) and (289.01, 309.6) to (267.8, 309.6), curve via (246.59, 309.6) and (229.4, 325.09) to (229.4, 344.2) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (237.4, 256.57) line to (266.74, 256.57), line to (275.8, 283), line to (284.86, 256.57), line to (314.2, 256.57), line to (290.47, 240.23), line to (299.53, 213.8), line to (275.8, 230.14), line to (252.07, 213.8), line to (261.13, 240.23) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (237.4, 256.57) line to (266.74, 256.57), line to (275.8, 283), line to (284.86, 256.57), line to (314.2, 256.57), line to (290.47, 240.23), line to (299.53, 213.8), line to (275.8, 230.14), line to (252.07, 213.8), line to (261.13, 240.23) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (163, 432.4) curve via (163, 485.09) and (216.9, 527.8) to (283.4, 527.8), curve via (349.9, 527.8) and (403.8, 485.09) to (403.8, 432.4), curve via (403.8, 379.71) and (349.9, 337) to (283.4, 337), curve via (216.9, 337) and (163, 379.71) to (163, 432.4) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (232.29, 460.93) curve via (232.29, 466.42) and (237.9, 470.87) to (244.83, 470.87), curve via (251.75, 470.87) and (257.37, 466.42) to (257.37, 460.93), curve via (257.37, 455.44) and (251.75, 450.99) to (244.83, 450.99), curve via (237.9, 450.99) and (232.29, 455.44) to (232.29, 460.93) (closed) and from (309.43, 460.93) curve via (309.43, 466.42) and (315.05, 470.87) to (321.97, 470.87), curve via (328.9, 470.87) and (334.51, 466.42) to (334.51, 460.93), curve via (334.51, 455.44) and (328.9, 450.99) to (321.97, 450.99), curve via (315.05, 450.99) and (309.43, 455.44) to (309.43, 460.93) (closed) - Filled with DeviceRgb(0.251, 0.408, 0.596)
Path from (232.29, 460.93) curve via (232.29, 466.42) and (237.9, 470.87) to (244.83, 470.87), curve via (251.75, 470.87) and (257.37, 466.42) to (257.37, 460.93), curve via (257.37, 455.44) and (251.75, 450.99) to (244.83, 450.99), curve via (237.9, 450.99) and (232.29, 455.44) to (232.29, 460.93) (closed) and from (309.43, 460.93) curve via (309.43, 466.42) and (315.05, 470.87) to (321.97, 470.87), curve via (328.9, 470.87) and (334.51, 466.42) to (334.51, 460.93), curve via (334.51, 455.44) and (328.9, 450.99) to (321.97, 450.99), curve via (315.05, 450.99) and (309.43, 455.44) to (309.43, 460.93) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (218.14, 390.8) curve via (261.65, 367.12) and (305.1, 367.12) to (348.51, 390.8) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (163, 432.4) curve via (163, 485.09) and (216.9, 527.8) to (283.4, 527.8), curve via (349.9, 527.8) and (403.8, 485.09) to (403.8, 432.4), curve via (403.8, 379.71) and (349.9, 337) to (283.4, 337), curve via (216.9, 337) and (163, 379.71) to (163, 432.4) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)
Path from (51.2, 60.025) line to (420.15, 60.025), line to (420.15, 736.975), line to (51.2, 736.975) (closed) - Invisible, clipping
Path from (255.48, 564.55) line to (368.53, 564.55), line to (368.53, 579.15), line to (255.48, 579.15) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (255.48, 550.13) line to (368.53, 550.13), line to (368.53, 564.755), line to (255.48, 564.755) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
Path from (255.48, 521.32) line to (368.53, 521.32), line to (368.53, 535.92), line to (255.48, 535.92) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
Path from (255.48, 492.52) line to (368.53, 492.52), line to (368.53, 507.12), line to (255.48, 507.12) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
Path from (255.48, 463.73) line to (368.53, 463.73), line to (368.53, 478.33), line to (255.48, 478.33) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
Path from (255.48, 434.9) line to (368.53, 434.9), line to (368.53, 449.525), line to (255.48, 449.525) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
Path from (255.48, 406.1) line to (368.53, 406.1), line to (368.53, 420.7), line to (255.48, 420.7) (closed) - Filled with DeviceRgb(0.863, 0.902, 0.945)
Path from (51.2, 565.15) line to (420.15, 565.15), line to (420.15, 578.55), line to (51.2, 578.55) (closed) - Invisible, clipping
Path from (49.2, 57.825) line to (422.35, 57.825), line to (422.35, 738.975), line to (49.2, 738.975) (closed) - Invisible, clipping
Path from (255.18, 579.45) line to (255.18, 405.8) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (255.08, 405.7) line to (256.08, 405.7), line to (256.08, 579.55), line to (255.08, 579.55) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (368.02, 578.45) line to (368.02, 405.8) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (367.93, 405.7) line to (368.93, 405.7), line to (368.93, 578.55), line to (367.93, 578.55) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 579.45) line to (368.83, 579.45) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 578.55) line to (368.93, 578.55), line to (368.93, 579.55), line to (256.08, 579.55) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 565.05) line to (368.83, 565.05) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 564.15) line to (368.93, 564.15), line to (368.93, 565.15), line to (256.08, 565.15) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 550.63) line to (368.83, 550.63) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 549.72) line to (368.93, 549.72), line to (368.93, 550.72), line to (256.08, 550.72) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 536.22) line to (368.83, 536.22) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 535.33) line to (368.93, 535.33), line to (368.93, 536.33), line to (256.08, 536.33) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 521.82) line to (368.83, 521.82) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 520.92) line to (368.93, 520.92), line to (368.93, 521.92), line to (256.08, 521.92) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 507.42) line to (368.83, 507.42) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 506.52) line to (368.93, 506.52), line to (368.93, 507.52), line to (256.08, 507.52) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 493.02) line to (368.83, 493.02) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 492.13) line to (368.93, 492.13), line to (368.93, 493.13), line to (256.08, 493.13) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 478.63) line to (368.83, 478.63) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 477.73) line to (368.93, 477.73), line to (368.93, 478.73), line to (256.08, 478.73) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 464.23) line to (368.83, 464.23) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 463.32) line to (368.93, 463.32), line to (368.93, 464.32), line to (256.08, 464.32) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 449.82) line to (368.83, 449.82) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 448.9) line to (368.93, 448.9), line to (368.93, 449.925), line to (256.08, 449.925) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 435.4) line to (368.83, 435.4) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 434.5) line to (368.93, 434.5), line to (368.93, 435.5), line to (256.08, 435.5) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 421) line to (368.83, 421) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 420.1) line to (368.93, 420.1), line to (368.93, 421.1), line to (256.08, 421.1) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.18, 406.6) line to (368.83, 406.6) - Stroked with DeviceRgb(0.584, 0.702, 0.843)
Path from (256.08, 405.7) line to (368.93, 405.7), line to (368.93, 406.7), line to (256.08, 406.7) (closed) - Filled with DeviceRgb(0.584, 0.702, 0.843)
Path from (280.6, 359.3) line to (287.93, 380.62), line to (307.13, 393.8), line to (330.87, 393.8), line to (350.07, 380.62), line to (357.4, 359.3), line to (350.07, 337.98), line to (330.87, 324.8), line to (307.13, 324.8), line to (287.93, 337.98) (closed) - Filled with DeviceRgb(0.31, 0.506, 0.741)
Path from (280.6, 359.3) line to (287.93, 380.62), line to (307.13, 393.8), line to (330.87, 393.8), line to (350.07, 380.62), line to (357.4, 359.3), line to (350.07, 337.98), line to (330.87, 324.8), line to (307.13, 324.8), line to (287.93, 337.98) (closed) - Stroked with DeviceRgb(0.22, 0.365, 0.541)

顺便说一句,上面的代码使用了以下导入:

Imports System.Globalization
Imports System.Text
Imports iText.Kernel.Colors
Imports iText.Kernel.Geom
Imports iText.Kernel.Pdf
Imports iText.Kernel.Pdf.Canvas.Parser
Imports iText.Kernel.Pdf.Canvas.Parser.Data
Imports iText.Kernel.Pdf.Canvas.Parser.Listener