从找到的 AcroForm 表单域中获取页面
Get page from found AcroForm form field
我有一个现有的 PDF,我想打开它并将内容添加到特定 PDField
(或具体 PDTerminalField
,我认为这无关紧要)所在的页面。
它可能在第一页或后面的任何一页。
我知道字段的名称,有了它,我可以查找它,甚至可以获得它在该页面上的尺寸和位置 (DRectangle mediabox = new PDRectangle((COSArray) fieldDict.getDictionaryObject(COSName.RECT));
)
但是我找不到获取它所在页面的 number/index 的方法,因此我可以在正确的页面上书写。
PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
PDField docField = acroForm.getField("the_coolest_field");
int page = docField.??? // This is the missing part.
PDPageContentStream contentStream = new PDPageContentStream(pdfDocument,
pdfDocument.getPage(page), PDPageContentStream.AppendMode.APPEND, true);
// now write something on the page where the field is in.
使用 中给出的提示,我可以创建一个包含字段名称及其出现的(最后)页面的地图。
HashMap<String, Integer> formFieldPages = new HashMap<>();
for (int page_i = 0; page_i < pdf_document.getNumberOfPages(); page_i++) {
List<PDAnnotation> annotations = pdf_document.getPage(page_i).getAnnotations(); //
for (PDAnnotation annotation: annotations) {
if (!(annotation instanceof PDAnnotationWidget)) {
System.err.println("Unknown annotation type " + annotation.getClass().getName() + ": " + annotation.toString());
continue;
}
String name = ((PDAnnotationWidget)annotation).getCOSObject().getString(COSName.T);
if (name == null) {
System.err.println("Unknown widget name: " + annotation.toString());
continue;
}
// make sure the field does not exists in the map
if (formFieldPages.containsKey(name)) {
System.err.println("Duplicated widget name, overwriting previous page value " + formFieldPages.get(name) + " with newly found page " + page_i + ": " + annotation.toString());
}
formFieldPages.put(name, page_i);
}
}
现在查找页面就这么简单
int page = formFieldPages.get(docField.getPartialName());
请注意,如果该小部件由于某种原因不存在,这可能会抛出 NullPointerException。
下面是之前的回答。我似乎错误那个方法,但我保留它以供参考:
I have found the /P
element which seems like it could be the page:
int page = (int)currentField.getCOSObject().getCOSObject(COSName.P).getObjectNumber();
page = page - 5; // I couldn't figure out why it's off by 4, but tests showed that the actual PDF page 1 (index [0]) is represented by `\P {4, 0}`, page 2 ([1]) is called "5", page 3 ([2]) is "6", etc.
我有一个现有的 PDF,我想打开它并将内容添加到特定 PDField
(或具体 PDTerminalField
,我认为这无关紧要)所在的页面。
它可能在第一页或后面的任何一页。
我知道字段的名称,有了它,我可以查找它,甚至可以获得它在该页面上的尺寸和位置 (DRectangle mediabox = new PDRectangle((COSArray) fieldDict.getDictionaryObject(COSName.RECT));
)
但是我找不到获取它所在页面的 number/index 的方法,因此我可以在正确的页面上书写。
PDAcroForm acroForm = pdfDocument.getDocumentCatalog().getAcroForm();
PDField docField = acroForm.getField("the_coolest_field");
int page = docField.??? // This is the missing part.
PDPageContentStream contentStream = new PDPageContentStream(pdfDocument,
pdfDocument.getPage(page), PDPageContentStream.AppendMode.APPEND, true);
// now write something on the page where the field is in.
使用
HashMap<String, Integer> formFieldPages = new HashMap<>();
for (int page_i = 0; page_i < pdf_document.getNumberOfPages(); page_i++) {
List<PDAnnotation> annotations = pdf_document.getPage(page_i).getAnnotations(); //
for (PDAnnotation annotation: annotations) {
if (!(annotation instanceof PDAnnotationWidget)) {
System.err.println("Unknown annotation type " + annotation.getClass().getName() + ": " + annotation.toString());
continue;
}
String name = ((PDAnnotationWidget)annotation).getCOSObject().getString(COSName.T);
if (name == null) {
System.err.println("Unknown widget name: " + annotation.toString());
continue;
}
// make sure the field does not exists in the map
if (formFieldPages.containsKey(name)) {
System.err.println("Duplicated widget name, overwriting previous page value " + formFieldPages.get(name) + " with newly found page " + page_i + ": " + annotation.toString());
}
formFieldPages.put(name, page_i);
}
}
现在查找页面就这么简单
int page = formFieldPages.get(docField.getPartialName());
请注意,如果该小部件由于某种原因不存在,这可能会抛出 NullPointerException。
下面是之前的回答。我似乎错误那个方法,但我保留它以供参考:
I have found the
/P
element which seems like it could be the page:int page = (int)currentField.getCOSObject().getCOSObject(COSName.P).getObjectNumber(); page = page - 5; // I couldn't figure out why it's off by 4, but tests showed that the actual PDF page 1 (index [0]) is represented by `\P {4, 0}`, page 2 ([1]) is called "5", page 3 ([2]) is "6", etc.