如何区分 PDF 和 PDF/A C#

Question

我开发了一个基于控制台的应用程序，它在整个午夜浏览同一个文件夹，以使用 ghostscript 在 PDF/A 中转换 PDF。

它确实有效，但现在我们得到了数百个文件，我需要检查每个文件是 PDF 还是 PDF/A，以避免在 PDF/A 个文件中启动脚本。

有什么办法可以区分 PDF 和 PDF/A 吗？

提前谢谢你。

Answer 1

您可以使用像 ITextSharp 这样的库来阅读 PDF 文件。

检查是否是PDF/A（好吧，实际上是检查它是否声称是PDF/A，这应该足以满足您的需求）是阅读PDF标签的简单操作.

this answer to another question中的代码应该就是您所需要的。它是 VB.NET，应该很容易翻译成 C#。

基本上：

使用来自 ITextSharp（或任何 pdf 阅读库）的 reader 打开 PDF
提取 XML 元数据
检查名为 pdfaid:conformance 的 XML 标签，并查看其值是否为 A

Answer 2

您可以使用 Spire.PDF 来检测 PDF 文档的一致性级别。查看以下代码：

PdfDocument pdf = new PdfDocument();
pdf.LoadFromFile("MS_Example.pdf");
PdfConformanceLevel conformance = pdf.Conformance;
Console.WriteLine(conformance.ToString());

输出：

免责声明：我是Spire的员工

Answer 3

您可以通过检查文档的 XMP 元数据来检查文档是否声称合规。

使用 Datalogics PDFL 库 C# 接口：

using (var docInput = new Document("input.pdf"))
{
    bool bIsPdfA1a =  docInput.XMPMetadata.Contains("pdfaid:conformance=\"A\"");
}

免责声明：我在 Datalogics 工作

Answer 4

抱歉耽搁了，我病了。

我使用 Pac0 的解决方案找到了解决我的问题的方法。

我没有使用 XML，而是像这样使用 iTextSharp.xmp：

public static bool CheckIfPdfa(PdfReader reader)
    {
        if (reader.Metadata != null && reader.Metadata.Length > 0)
        {
            IXmpMeta xmpMeta = XmpMetaParser.Parse(reader.Metadata, null);
            IXmpProperty pdfaidConformance = xmpMeta.GetProperty(XmpConst.NS_PDFA_ID, "pdfaid:conformance");
            IXmpProperty pdfaidPart = xmpMeta.GetProperty(XmpConst.NS_PDFA_ID, "pdfaid:part");
            reader.Close();

            if (pdfaidConformance == null || pdfaidPart == null)
            {
                return false;
            }
            else
            {
                return true;
            }
        }
        return false;
    }

谢谢大家的回答。

如何区分 PDF 和 PDF/A C#

How to make the difference between PDF and PDF/A C#

c#

validation

pdfa