如何解释 iTextSharp TJ/TF 结果
How to interpret iTextSharp TJ/TF result
我正在使用 iTextSharp 库通过 PowerShell 实施自动化过程,以提取有关多个 PDF 文档的所需信息。
基于此 PDF 内容部分:
它returns这个结果:
[(1)-1688.21(1)-492.975(0)-493.019(0)]TJ
[(5)-493.019(0)-17728.1(2)]TJ
我可以通过一些正则表达式操作来提取文字值,但是,仅使用这种方法,结果是:
$line -replace "^\[\(|\)\]TJ$", "" -split "\)\-?\d+\.?\d*\(" -join ""
1000
502
当然,这些结果不是完整的,我需要在 reading/parsing 上更详细地说明。
我怀疑文字字符之间的数字(例如 -1688.21、-492.975 等)可能有用,但我没有找到有关此类参数的解释。
它们代表什么?
如果您想了解 PDF 格式的详细信息,您应该查看 PDF 规范 ISO 32000。
Operands
Operator
Description
array
TJ
Show one or more text strings, allowing individual glyph positioning. Each element of array shall be either a string or a number. If the element is a string, this operator shall show the string. If it is a number, the operator shall adjust the text position by that amount; that is, it shall translate the text matrix, Tm. The number shall be expressed in thousandths of a unit of text space (see 9.4.4, "Text Space Details"). This amount shall be subtracted from the current horizontal or vertical coordinate, depending on the writing mode. In the default coordinate system, a positive adjustment has the effect of moving the next glyph painted either to the left or down by the given amount. Figure 46 shows an example of the effect of passing offsets to TJ.
(ISO 32000-1, Table 109 – 文本显示运算符)
因此,
I'm suspecting that the numbers between the literal characters (e.g -1688.21,-492.975,...), may be useful, but I didnt find explanation about such parameters.
What they represent?
对于每个这样的数字,操作员都会按该数量调整文本位置。该数字以文本单位的千分之一表示space。此数量是从当前水平或垂直坐标中减去,具体取决于书写模式。
我正在使用 iTextSharp 库通过 PowerShell 实施自动化过程,以提取有关多个 PDF 文档的所需信息。
基于此 PDF 内容部分:
它returns这个结果:
[(1)-1688.21(1)-492.975(0)-493.019(0)]TJ
[(5)-493.019(0)-17728.1(2)]TJ
我可以通过一些正则表达式操作来提取文字值,但是,仅使用这种方法,结果是:
$line -replace "^\[\(|\)\]TJ$", "" -split "\)\-?\d+\.?\d*\(" -join ""
1000
502
当然,这些结果不是完整的,我需要在 reading/parsing 上更详细地说明。 我怀疑文字字符之间的数字(例如 -1688.21、-492.975 等)可能有用,但我没有找到有关此类参数的解释。
它们代表什么?
如果您想了解 PDF 格式的详细信息,您应该查看 PDF 规范 ISO 32000。
Operands | Operator | Description |
---|---|---|
array | TJ | Show one or more text strings, allowing individual glyph positioning. Each element of array shall be either a string or a number. If the element is a string, this operator shall show the string. If it is a number, the operator shall adjust the text position by that amount; that is, it shall translate the text matrix, Tm. The number shall be expressed in thousandths of a unit of text space (see 9.4.4, "Text Space Details"). This amount shall be subtracted from the current horizontal or vertical coordinate, depending on the writing mode. In the default coordinate system, a positive adjustment has the effect of moving the next glyph painted either to the left or down by the given amount. Figure 46 shows an example of the effect of passing offsets to TJ. |
(ISO 32000-1, Table 109 – 文本显示运算符)
因此,
I'm suspecting that the numbers between the literal characters (e.g -1688.21,-492.975,...), may be useful, but I didnt find explanation about such parameters.
What they represent?
对于每个这样的数字,操作员都会按该数量调整文本位置。该数字以文本单位的千分之一表示space。此数量是从当前水平或垂直坐标中减去,具体取决于书写模式。