使用 itextsharp 从签名图像中获取 Layer2 文本（签名描述）

Question

我需要从签名中检索 layer2 文本。如何使用 itextsharp 获取描述（在签名图像下）？下面是我用来获取签名日期和用户名的代码：

        PdfReader reader = new PdfReader(pdfPath, System.Text.Encoding.UTF8.GetBytes(MASTER_PDF_PASSWORD));
        using (MemoryStream memoryStream = new MemoryStream())
        {
            PdfStamper stamper = new PdfStamper(reader, memoryStream);
            AcroFields acroFields = stamper.AcroFields;
            List<String> names = acroFields.GetSignatureNames();
            foreach (String name in names)
            {
                PdfPKCS7 pk = acroFields.VerifySignature(name);
                String userName = PdfPKCS7.GetSubjectFields(pk.SigningCertificate).GetField("CN");
                Console.WriteLine("Sign Date: " + pk.SignDate.ToString() + " Name: " + userName);
               // Here i need to retrieve the description underneath the signature image
            }
            reader.RemoveUnusedObjects();
            reader.Close();
            stamper.Writer.CloseStream = false;
            if (stamper != null)
            {
                stamper.Close();
            }
        }

下面是我用来设置描述的代码

PdfStamper st = PdfStamper.CreateSignature(reader, memoryStream, '[=11=]', null, true);
PdfSignatureAppearance sap = st.SignatureAppearance;
sap.Render = PdfSignatureAppearance.SignatureRender.GraphicAndDescription;
sap.Layer2Font = font;
sap.Layer2Text = "Some text that i want to retrieve";

谢谢。

Answer 1

请查看以下 PDF：signature_n2.pdf。它在 n2 层包含带有以下文本的签名：

This document was signed by Bruno
Specimen.

在我们编写代码来提取这段文字之前，我们应该使用iText RUPS查看PDF的内部结构，这样我们就可以找出这个/n2层存储在哪里：

基于这些信息，我们可以开始编写代码了。请参阅 GetN2fromSig 示例：

public static void main(String[] args) throws IOException {
    PdfReader reader = new PdfReader(SRC);
    AcroFields fields = reader.getAcroFields();
    Item item = fields.getFieldItem("Signature1");
    PdfDictionary widget = item.getWidget(0);
    PdfDictionary ap = widget.getAsDict(PdfName.AP);
    PdfStream normal = ap.getAsStream(PdfName.N);
    PdfDictionary resources = normal.getAsDict(PdfName.RESOURCES);
    PdfDictionary xobject = resources.getAsDict(PdfName.XOBJECT);
    PdfStream frm = xobject.getAsStream(PdfName.FRM);
    PdfDictionary res = frm.getAsDict(PdfName.RESOURCES);
    PdfDictionary xobj = res.getAsDict(PdfName.XOBJECT);
    PRStream n2 = (PRStream) xobj.getAsStream(PdfName.N2);
    byte[] stream = PdfReader.getStreamBytes(n2);
    System.out.println(new String(stream));
}

我们得到了名称为 "signature1" 的签名字段的小部件注释。根据RUPS的信息，我们知道我们必须获得正常（/N）外观（/AP）的资源（/Resources）。在 /XObjects 字典中，我们将找到一个名为 /FRM 的表单 XObject。这个 XObject 也有一些 /Resources，更具体地说是两个 /XObject，一个名为 /n0，另一个名为 /n2.

我们获取 /n2 对象的流，并将其转换为未压缩的 byte[]。当我们将此数组打印为 String 时，我们得到以下结果：

BT
1 0 0 1 0 49.55 Tm
/F1 12 Tf
(This document was signed by Bruno)Tj
1 0 0 1 0 31.55 Tm
(Specimen.)Tj
ET

这是 PDF 语法。 BT和ET代表"Begin Text"和"End Text"。 Tm 运算符设置文本矩阵。 Tf 运算符设置字体。 Tj 显示由 ( 和 ) 分隔的字符串。如果你想要纯文本，只提取括号之间的文本就足够了。

Answer 2

虽然 Bruno 从包含 "layer 2" 的 PDF 开始解决了这个问题，请允许我首先声明在 PDF 签名外观中使用这些 "signature layers" 是 不是PDF规范要求的，规范实际上根本不知道这些层！因此，如果您尝试解析特定层，您可能找不到这样的 "layer"，或者更糟的是，找到看起来像该层的东西（名为 n2 的 XObject）其中包含错误的数据。

尽管如此，无论您是从第 2 层还是从整个签名外观寻找文本，都可以使用 iTextSharp 文本提取功能。我使用 Bruno 的代码作为检索 n2 层的基础。

public static void ExtractSignatureTextFromFile(FileInfo file)
{
    try
    {
        Console.Out.Write("File: {0}\n", file);
        using (var pdfReader = new PdfReader(file.FullName))
        {
            AcroFields fields = pdfReader.AcroFields;
            foreach (string name in fields.GetSignatureNames())
            {
                Console.Out.Write("  Signature: {0}\n", name);
                iTextSharp.text.pdf.AcroFields.Item item = fields.GetFieldItem(name);
                PdfDictionary widget = item.GetWidget(0);
                PdfDictionary ap = widget.GetAsDict(PdfName.AP);
                if (ap == null)
                    continue;
                PdfStream normal = ap.GetAsStream(PdfName.N);
                if (normal == null)
                    continue;
                Console.Out.Write("    Content of normal appearance: {0}\n", extractText(normal));

                PdfDictionary resources = normal.GetAsDict(PdfName.RESOURCES);
                if (resources == null)
                    continue;
                PdfDictionary xobject = resources.GetAsDict(PdfName.XOBJECT);
                if (xobject == null)
                    continue;
                PdfStream frm = xobject.GetAsStream(PdfName.FRM);
                if (frm == null)
                    continue;
                PdfDictionary res = frm.GetAsDict(PdfName.RESOURCES);
                if (res == null)
                    continue;
                PdfDictionary xobj = res.GetAsDict(PdfName.XOBJECT);
                if (xobj == null)
                    continue;
                PRStream n2 = (PRStream) xobj.GetAsStream(PdfName.N2);
                if (n2 == null)
                    continue;
                Console.Out.Write("    Content of normal appearance, layer 2: {0}\n", extractText(n2));
            }
        }
    }
    catch (Exception ex)
    {
        Console.Error.Write("Error... " + ex.StackTrace);
    }
}

public static String extractText(PdfStream xObject)
{
    PdfDictionary resources = xObject.GetAsDict(PdfName.RESOURCES);
    ITextExtractionStrategy strategy = new LocationTextExtractionStrategy();

    PdfContentStreamProcessor processor = new PdfContentStreamProcessor(strategy);
    processor.ProcessContent(ContentByteUtils.GetContentBytesFromContentObject(xObject), resources);
    return strategy.GetResultantText();
}

Bruno 使用的样本文件 signature_n2.pdf 你得到这个：

File: ...\signature_n2.pdf
  Signature: Signature1
    Content of normal appearance: This document was signed by Bruno
Specimen.
    Content of normal appearance, layer 2: This document was signed by Bruno
Specimen.

由于此示例按照 OP 的预期使用了第 2 层，因此它已包含相关文本。

使用 itextsharp 从签名图像中获取 Layer2 文本（签名描述）

Get Layer2 Text (Signature Description) from signature image using itextsharp

c#

itextsharp