PDFBox 2 不创建 PDF/A 文件
PDFBox 2 does not create PDF/A file
我正在尝试使用 PDFBox 2 创建一个 PDF/A 文件。我的代码基于示例代码 here。代码运行没有错误。但是,如果我使用 callas pdfPilot 和 veraPDF 验证文件,则没有 XMP 元数据,也没有 PDF/A 版本信息。 PDF 文件也是 1.4 版。不是代码中设置的 1.7。
// TTF font needed for Unicode support in OCR texts
PDFont font = PDType0Font.load(document,
PDDocument.class.getResourceAsStream("/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf"), true);
// Add metadata (needed by PDF/A)
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
try {
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setTitle("THE DOCUMENT TITLE");
dc.addCreator("THE AUTHOR");
PDFAIdentificationSchema id = xmp.createAndAddPFAIdentificationSchema();
id.setPart(2);
id.setConformance("B");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(document);
metadata.importXMPMetadata(baos.toByteArray());
document.getDocumentCatalog().setMetadata(metadata);
} catch (BadFieldValueException e) {
throw new IllegalArgumentException("", e);
}
// Set color profile (needed by PDF/A)
InputStream colorProfile = PDDocument.class.getResourceAsStream("/sRGB.icc");
PDOutputIntent intent = new PDOutputIntent(document, colorProfile);
intent.setInfo("sRGB IEC61966-2.1");
intent.setOutputCondition("sRGB IEC61966-2.1");
intent.setOutputConditionIdentifier("sRGB IEC61966-2.1");
intent.setRegistryName("http://www.color.org");
document.getDocumentCatalog().addOutputIntent(intent);
// Render all pages
for (IPage page : pages) {
((PdfboxPage)page).setFont(font);
page.renderPage(this);
document.addPage((PDPage) page.getPage());
}
document.setVersion(1.7f);
document.save(path);
document.close();
我做错了什么?
编辑 1:
我可以看到PDF文件中有xpacket。它包括元数据。但看起来 PDFBox 没有以有效的方式写入此数据(对于 veraPDF 和 pdfPilot)。
编辑 2:
看起来 PDFBox 2.0.12 构建无效 PDF/A。我使用我们的商业 pdfPilot 程序转换了 PDF。 (PDF/A-1b)
PDFBox 将其写入 PDF 文件(-> 在 veraPDF 和 pdfPilot 中无效):
<?xpacket begin="
" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="">
<dc:title>
<rdf:Alt>
<rdf:li lang="x-default">THE DOCUMENT TITLE</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq>
<rdf:li>THE AUTHOR</rdf:li>
</rdf:Seq>
</dc:creator>
</rdf:Description>
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="">
<pdfaid:part>1</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
pdfPilot 将其写入 PDF 文件(-> 在 veraPDF 和 pdfPilot 中有效):
<?xpacket begin="
" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c015 81.159809, 2016/11/11-01:42:16 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xmp="http://ns.adobe.com/xap/1.0/"
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
xmlns:stEvt="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#"
xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"
xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/"
xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#"
xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#">
<dc:format>application/pdf</dc:format>
<dc:creator>
<rdf:Seq>
<rdf:li>AUTOR</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">TITEL</rdf:li>
</rdf:Alt>
</dc:title>
<xmp:ModifyDate>2019-01-11T11:42:22+01:00</xmp:ModifyDate>
<xmp:CreateDate>2019-01-11T11:42:21+01:00</xmp:CreateDate>
<xmp:MetadataDate>2019-01-11T11:42:22+01:00</xmp:MetadataDate>
<xmpMM:DocumentID>uuid:b60f88c2-aa89-11b2-0a00-104bbf060000</xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:b61148b9-aa89-11b2-0a00-60d9faa0ff7f</xmpMM:InstanceID>
<xmpMM:RenditionClass>default</xmpMM:RenditionClass>
<xmpMM:VersionID>1</xmpMM:VersionID>
<xmpMM:History>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<stEvt:action>converted</stEvt:action>
<stEvt:instanceID>uuid:b60f88c3-aa89-11b2-0a00-902dfba0ff7f</stEvt:instanceID>
<stEvt:parameters>converted to PDF/A-1b</stEvt:parameters>
<stEvt:softwareAgent>pdfaPilot</stEvt:softwareAgent>
<stEvt:when>2019-01-11T11:42:22+01:00</stEvt:when>
</rdf:li>
</rdf:Seq>
</xmpMM:History>
<pdfaid:part>1</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
<pdfaExtension:schemas>
<rdf:Bag>
<rdf:li rdf:parseType="Resource">
<pdfaSchema:namespaceURI>http://ns.adobe.com/xap/1.0/mm/</pdfaSchema:namespaceURI>
<pdfaSchema:prefix>xmpMM</pdfaSchema:prefix>
<pdfaSchema:schema>XMP Media Management Schema</pdfaSchema:schema>
<pdfaSchema:property>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>UUID based identifier for specific incarnation of a document</pdfaProperty:description>
<pdfaProperty:name>InstanceID</pdfaProperty:name>
<pdfaProperty:valueType>URI</pdfaProperty:valueType>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>The common identifier for all versions and renditions of a document.</pdfaProperty:description>
<pdfaProperty:name>OriginalDocumentID</pdfaProperty:name>
<pdfaProperty:valueType>URI</pdfaProperty:valueType>
</rdf:li>
</rdf:Seq>
</pdfaSchema:property>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaSchema:namespaceURI>http://www.aiim.org/pdfa/ns/id/</pdfaSchema:namespaceURI>
<pdfaSchema:prefix>pdfaid</pdfaSchema:prefix>
<pdfaSchema:schema>PDF/A ID Schema</pdfaSchema:schema>
<pdfaSchema:property>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>Part of PDF/A standard</pdfaProperty:description>
<pdfaProperty:name>part</pdfaProperty:name>
<pdfaProperty:valueType>Integer</pdfaProperty:valueType>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>Amendment of PDF/A standard</pdfaProperty:description>
<pdfaProperty:name>amd</pdfaProperty:name>
<pdfaProperty:valueType>Text</pdfaProperty:valueType>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>Conformance level of PDF/A standard</pdfaProperty:description>
<pdfaProperty:name>conformance</pdfaProperty:name>
<pdfaProperty:valueType>Text</pdfaProperty:valueType>
</rdf:li>
</rdf:Seq>
</pdfaSchema:property>
</rdf:li>
</rdf:Bag>
</pdfaExtension:schemas>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
如果我将其静态写入 PDF 文件,它会生成一个有效的 PDF/A 文件:
String xmpData = "<?xpacket ......";
PDMetadata metadata = new PDMetadata(document);
metadata.importXMPMetadata(xmpData.getBytes());
编辑 3:
添加这个有效且简短:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" >
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
<dc:format>application/pdf</dc:format>
<dc:creator>
<rdf:Seq>
<rdf:li>AUTOR</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">TITEL</rdf:li>
</rdf:Alt>
</dc:title>
<pdfaid:part>1</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
这与 CreatePDFA 示例
生成的 XML 之间存在差异
<rdf:li xml:lang="x-default">THE DOCUMENT TITLE</rdf:li>
到你所得到的
<rdf:li lang="x-default">THE DOCUMENT TITLE</rdf:li>
这让我想起了 1.5 年前我们曾讨论过的一个问题 here。
所以引用我 2017 年的回答:这段代码
Transformer transformer = TransformerFactory.newInstance().newTransformer();
应该return一个com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImplclass。如果没有,则调用
Transformer transformer =
TransformerFactory.newInstance("com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl", null).newTransformer();
或设置系统属性:
System.setProperty("javax.xml.transform.TransformerFactory", "com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl");
我无法回答(因为你没有告诉)你是如何最终拥有这个转换器的,如果你改变它,你的应用程序的其余部分会发生什么。
我正在尝试使用 PDFBox 2 创建一个 PDF/A 文件。我的代码基于示例代码 here。代码运行没有错误。但是,如果我使用 callas pdfPilot 和 veraPDF 验证文件,则没有 XMP 元数据,也没有 PDF/A 版本信息。 PDF 文件也是 1.4 版。不是代码中设置的 1.7。
// TTF font needed for Unicode support in OCR texts
PDFont font = PDType0Font.load(document,
PDDocument.class.getResourceAsStream("/org/apache/pdfbox/resources/ttf/LiberationSans-Regular.ttf"), true);
// Add metadata (needed by PDF/A)
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
try {
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setTitle("THE DOCUMENT TITLE");
dc.addCreator("THE AUTHOR");
PDFAIdentificationSchema id = xmp.createAndAddPFAIdentificationSchema();
id.setPart(2);
id.setConformance("B");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(document);
metadata.importXMPMetadata(baos.toByteArray());
document.getDocumentCatalog().setMetadata(metadata);
} catch (BadFieldValueException e) {
throw new IllegalArgumentException("", e);
}
// Set color profile (needed by PDF/A)
InputStream colorProfile = PDDocument.class.getResourceAsStream("/sRGB.icc");
PDOutputIntent intent = new PDOutputIntent(document, colorProfile);
intent.setInfo("sRGB IEC61966-2.1");
intent.setOutputCondition("sRGB IEC61966-2.1");
intent.setOutputConditionIdentifier("sRGB IEC61966-2.1");
intent.setRegistryName("http://www.color.org");
document.getDocumentCatalog().addOutputIntent(intent);
// Render all pages
for (IPage page : pages) {
((PdfboxPage)page).setFont(font);
page.renderPage(this);
document.addPage((PDPage) page.getPage());
}
document.setVersion(1.7f);
document.save(path);
document.close();
我做错了什么?
编辑 1:
我可以看到PDF文件中有xpacket。它包括元数据。但看起来 PDFBox 没有以有效的方式写入此数据(对于 veraPDF 和 pdfPilot)。
编辑 2:
看起来 PDFBox 2.0.12 构建无效 PDF/A。我使用我们的商业 pdfPilot 程序转换了 PDF。 (PDF/A-1b)
PDFBox 将其写入 PDF 文件(-> 在 veraPDF 和 pdfPilot 中无效):
<?xpacket begin="
" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description xmlns:dc="http://purl.org/dc/elements/1.1/" rdf:about="">
<dc:title>
<rdf:Alt>
<rdf:li lang="x-default">THE DOCUMENT TITLE</rdf:li>
</rdf:Alt>
</dc:title>
<dc:creator>
<rdf:Seq>
<rdf:li>THE AUTHOR</rdf:li>
</rdf:Seq>
</dc:creator>
</rdf:Description>
<rdf:Description xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/" rdf:about="">
<pdfaid:part>1</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
pdfPilot 将其写入 PDF 文件(-> 在 veraPDF 和 pdfPilot 中有效):
<?xpacket begin="
" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 5.6-c015 81.159809, 2016/11/11-01:42:16 ">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xmp="http://ns.adobe.com/xap/1.0/"
xmlns:xmpMM="http://ns.adobe.com/xap/1.0/mm/"
xmlns:stEvt="http://ns.adobe.com/xap/1.0/sType/ResourceEvent#"
xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/"
xmlns:pdfaExtension="http://www.aiim.org/pdfa/ns/extension/"
xmlns:pdfaSchema="http://www.aiim.org/pdfa/ns/schema#"
xmlns:pdfaProperty="http://www.aiim.org/pdfa/ns/property#">
<dc:format>application/pdf</dc:format>
<dc:creator>
<rdf:Seq>
<rdf:li>AUTOR</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">TITEL</rdf:li>
</rdf:Alt>
</dc:title>
<xmp:ModifyDate>2019-01-11T11:42:22+01:00</xmp:ModifyDate>
<xmp:CreateDate>2019-01-11T11:42:21+01:00</xmp:CreateDate>
<xmp:MetadataDate>2019-01-11T11:42:22+01:00</xmp:MetadataDate>
<xmpMM:DocumentID>uuid:b60f88c2-aa89-11b2-0a00-104bbf060000</xmpMM:DocumentID>
<xmpMM:InstanceID>uuid:b61148b9-aa89-11b2-0a00-60d9faa0ff7f</xmpMM:InstanceID>
<xmpMM:RenditionClass>default</xmpMM:RenditionClass>
<xmpMM:VersionID>1</xmpMM:VersionID>
<xmpMM:History>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<stEvt:action>converted</stEvt:action>
<stEvt:instanceID>uuid:b60f88c3-aa89-11b2-0a00-902dfba0ff7f</stEvt:instanceID>
<stEvt:parameters>converted to PDF/A-1b</stEvt:parameters>
<stEvt:softwareAgent>pdfaPilot</stEvt:softwareAgent>
<stEvt:when>2019-01-11T11:42:22+01:00</stEvt:when>
</rdf:li>
</rdf:Seq>
</xmpMM:History>
<pdfaid:part>1</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
<pdfaExtension:schemas>
<rdf:Bag>
<rdf:li rdf:parseType="Resource">
<pdfaSchema:namespaceURI>http://ns.adobe.com/xap/1.0/mm/</pdfaSchema:namespaceURI>
<pdfaSchema:prefix>xmpMM</pdfaSchema:prefix>
<pdfaSchema:schema>XMP Media Management Schema</pdfaSchema:schema>
<pdfaSchema:property>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>UUID based identifier for specific incarnation of a document</pdfaProperty:description>
<pdfaProperty:name>InstanceID</pdfaProperty:name>
<pdfaProperty:valueType>URI</pdfaProperty:valueType>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>The common identifier for all versions and renditions of a document.</pdfaProperty:description>
<pdfaProperty:name>OriginalDocumentID</pdfaProperty:name>
<pdfaProperty:valueType>URI</pdfaProperty:valueType>
</rdf:li>
</rdf:Seq>
</pdfaSchema:property>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaSchema:namespaceURI>http://www.aiim.org/pdfa/ns/id/</pdfaSchema:namespaceURI>
<pdfaSchema:prefix>pdfaid</pdfaSchema:prefix>
<pdfaSchema:schema>PDF/A ID Schema</pdfaSchema:schema>
<pdfaSchema:property>
<rdf:Seq>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>Part of PDF/A standard</pdfaProperty:description>
<pdfaProperty:name>part</pdfaProperty:name>
<pdfaProperty:valueType>Integer</pdfaProperty:valueType>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>Amendment of PDF/A standard</pdfaProperty:description>
<pdfaProperty:name>amd</pdfaProperty:name>
<pdfaProperty:valueType>Text</pdfaProperty:valueType>
</rdf:li>
<rdf:li rdf:parseType="Resource">
<pdfaProperty:category>internal</pdfaProperty:category>
<pdfaProperty:description>Conformance level of PDF/A standard</pdfaProperty:description>
<pdfaProperty:name>conformance</pdfaProperty:name>
<pdfaProperty:valueType>Text</pdfaProperty:valueType>
</rdf:li>
</rdf:Seq>
</pdfaSchema:property>
</rdf:li>
</rdf:Bag>
</pdfaExtension:schemas>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
如果我将其静态写入 PDF 文件,它会生成一个有效的 PDF/A 文件:
String xmpData = "<?xpacket ......";
PDMetadata metadata = new PDMetadata(document);
metadata.importXMPMetadata(xmpData.getBytes());
编辑 3:
添加这个有效且简短:
<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" >
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:pdfaid="http://www.aiim.org/pdfa/ns/id/">
<dc:format>application/pdf</dc:format>
<dc:creator>
<rdf:Seq>
<rdf:li>AUTOR</rdf:li>
</rdf:Seq>
</dc:creator>
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">TITEL</rdf:li>
</rdf:Alt>
</dc:title>
<pdfaid:part>1</pdfaid:part>
<pdfaid:conformance>B</pdfaid:conformance>
</rdf:Description>
</rdf:RDF>
</x:xmpmeta>
<?xpacket end="w"?>
这与 CreatePDFA 示例
生成的 XML 之间存在差异<rdf:li xml:lang="x-default">THE DOCUMENT TITLE</rdf:li>
到你所得到的
<rdf:li lang="x-default">THE DOCUMENT TITLE</rdf:li>
这让我想起了 1.5 年前我们曾讨论过的一个问题 here。
所以引用我 2017 年的回答:这段代码
Transformer transformer = TransformerFactory.newInstance().newTransformer();
应该return一个com.sun.org.apache.xalan.internal.xsltc.trax.TransformerImplclass。如果没有,则调用
Transformer transformer =
TransformerFactory.newInstance("com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl", null).newTransformer();
或设置系统属性:
System.setProperty("javax.xml.transform.TransformerFactory", "com.sun.org.apache.xalan.internal.xsltc.trax.TransformerFactoryImpl");
我无法回答(因为你没有告诉)你是如何最终拥有这个转换器的,如果你改变它,你的应用程序的其余部分会发生什么。