如何使用 PDFbox 将元数据添加到 PDF 文档?
How to add metadata to PDF document using PDFbox?
我有可用的 PDF 文档输入流。我想将 subject
元数据添加到文档中,然后保存它。我不知道该怎么做。
我在这里看到了一个示例食谱:https://pdfbox.apache.org/1.8/cookbook/workingwithmetadata.html
但是,还是很模糊。以下是我正在尝试的内容以及我有疑问的地方
PDDocument doc = PDDocument.load(myInputStream);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
InputStream newXMPData = ...; //what goes here? How can I add subject tag?
PDMetadata newMetadata = new PDMetadata(doc, newXMLData, false );
catalog.setMetadata( newMetadata );
//does anything else need to happen to save the document??
//I would like an outputstream of the document (with metadata) so that I can save it to an S3 bucket
本答案使用xmpbox,来自源代码下载中的AddMetadataFromDocInfo example:
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setDescription("descr");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
doc.getDocumentCatalog().setMetadata(metadata);
以下代码设置 PDF 文档的标题,但它也应该适用于其他属性:
public static byte[] insertTitlePdf(byte[] documentBytes, String title) {
try {
PDDocument document = PDDocument.load(documentBytes);
PDDocumentInformation info = document.getDocumentInformation();
info.setTitle(title);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
document.save(baos);
return baos.toByteArray();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
Apache PDFBox 是必需的,因此将其导入到例如Maven 与:
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.6</version>
</dependency>
添加标题:
byte[] documentBytesWithTitle = insertTitlePdf(documentBytes, "Some fancy title");
在浏览器中显示(JSF 示例):
<object class="pdf" data="data:application/pdf;base64,#{myBean.getDocumentBytesWithTitleAsBase64()}" type="application/pdf">Document could not be loaded</object>
结果(Chrome):
另一种更简单的方法是使用内置的 Document Information 对象:
PDDocument inputDoc = // your doc
inputDoc.getDocumentInformation().setCreator("Some meta");
inputDoc.getDocumentInformation().setCustomMetadataValue("fieldName", "fieldValue");
这还具有不需要 xmpbox 库的好处。
我有可用的 PDF 文档输入流。我想将 subject
元数据添加到文档中,然后保存它。我不知道该怎么做。
我在这里看到了一个示例食谱:https://pdfbox.apache.org/1.8/cookbook/workingwithmetadata.html
但是,还是很模糊。以下是我正在尝试的内容以及我有疑问的地方
PDDocument doc = PDDocument.load(myInputStream);
PDDocumentCatalog catalog = doc.getDocumentCatalog();
InputStream newXMPData = ...; //what goes here? How can I add subject tag?
PDMetadata newMetadata = new PDMetadata(doc, newXMLData, false );
catalog.setMetadata( newMetadata );
//does anything else need to happen to save the document??
//I would like an outputstream of the document (with metadata) so that I can save it to an S3 bucket
本答案使用xmpbox,来自源代码下载中的AddMetadataFromDocInfo example:
XMPMetadata xmp = XMPMetadata.createXMPMetadata();
DublinCoreSchema dc = xmp.createAndAddDublinCoreSchema();
dc.setDescription("descr");
XmpSerializer serializer = new XmpSerializer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
serializer.serialize(xmp, baos, true);
PDMetadata metadata = new PDMetadata(doc);
metadata.importXMPMetadata(baos.toByteArray());
doc.getDocumentCatalog().setMetadata(metadata);
以下代码设置 PDF 文档的标题,但它也应该适用于其他属性:
public static byte[] insertTitlePdf(byte[] documentBytes, String title) {
try {
PDDocument document = PDDocument.load(documentBytes);
PDDocumentInformation info = document.getDocumentInformation();
info.setTitle(title);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
document.save(baos);
return baos.toByteArray();
} catch (IOException e) {
e.printStackTrace();
}
return null;
}
Apache PDFBox 是必需的,因此将其导入到例如Maven 与:
<dependency>
<groupId>org.apache.pdfbox</groupId>
<artifactId>pdfbox</artifactId>
<version>2.0.6</version>
</dependency>
添加标题:
byte[] documentBytesWithTitle = insertTitlePdf(documentBytes, "Some fancy title");
在浏览器中显示(JSF 示例):
<object class="pdf" data="data:application/pdf;base64,#{myBean.getDocumentBytesWithTitleAsBase64()}" type="application/pdf">Document could not be loaded</object>
结果(Chrome):
另一种更简单的方法是使用内置的 Document Information 对象:
PDDocument inputDoc = // your doc
inputDoc.getDocumentInformation().setCreator("Some meta");
inputDoc.getDocumentInformation().setCustomMetadataValue("fieldName", "fieldValue");
这还具有不需要 xmpbox 库的好处。