将页面复制到另一个文档后无法访问 Pdf 弹出注释父级
PdfPopupAnnotation parent not accessible after copying page to other document
我正在将带注释的 PDF 页面从一个文档复制到另一个文档。我遇到的奇怪事情是,在新文档中,我无法访问 PdfPopupAnnotation
s:
的父级
public class CopyPdfTest {
public static void main(String[] args) throws IOException {
PdfDocument inputDoc = new PdfDocument(new PdfReader("src/test/resources/input.pdf"));
PdfDocument outputDoc = new PdfDocument(new PdfWriter("/tmp/output.pdf"));
// Copy pages
for (int i = 1; i <= inputDoc.getNumberOfPages(); i++) {
inputDoc.copyPagesTo(i, i, outputDoc);
}
// Re-open outputDoc to eliminate the possibility the problem stems from
// it being opened in writing mode
outputDoc.close();
outputDoc = new PdfDocument(new PdfReader("/tmp/output.pdf"));
// Step through the PdfPopupAnnotations in both documents and check for their parents
for (PdfDocument doc : new PdfDocument[] { inputDoc, outputDoc } ) {
for (int i = 1; i <= inputDoc.getNumberOfPages(); i++) {
for (PdfAnnotation annot : doc.getPage(i).getAnnotations()) {
if (annot instanceof PdfPopupAnnotation) {
// This prints null for popups from the outputDoc
System.out.println(((PdfPopupAnnotation) annot).getParentObject());
}
}
}
}
}
}
当处理带有一个 /Square
注释的 PDF 时,这会导致以下输出(第一行打印原始 PDF 的弹出式注释父级,第二行打印 null
用于输出 PDF):
<</AP <</N 10 0 R >> /C [0.898026 0.133331 0.215683 ] /Contents test /CreationDate D:20180107105025+01'00' /F 4 /M D:20180107105029+01'00' /NM 8a233cc7-ed2f-48bf-91f2-a46cecf15160 /P 9 0 R /Popup 16 0 R /RC <?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:18.9.0" xfa:spec="2.0.2" ><p dir="ltr"><span dir="ltr" style="font-size:10.5pt;text-align:left;color:#000000;font-weight:normal;font-style:normal">test</span></p></body> /RD [0.5 0.5 0.5 0.5 ] /Rect [84.7495 636.205 191.876 764.21 ] /Subj Rectangle /Subtype /Square /T tom /Type /Annot >>
null
我发现这在查看未压缩的示例 PDF 时特别奇怪,父引用 4 0 R
保持不变,引用的 /Square
注释仍然存在 4 0 obj
.
input.pdf
%PDF-1.4
%âãÏÓ
5 0 obj
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Type /Annot
/Parent 4 0 R
/Open false
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj
6 0 obj
<<
/FormType 1
/Subtype /Form
/Type /XObject
/BBox [115.975 693.768 179.508 827.883]
/Length 69
/Matrix [1 0 0 1 -115.975 -693.768]
>>
stream
1.000 0.000 0.000 RG
2 w
0 J
0 j
116.975 694.768 61.534 132.115 re
S
endstream
endobj
4 0 obj
<<
/Subtype /Square
/RD [0 0 0 0]
/RC (<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:11.0.0" xfa:spec="2.0.2"><p dir="ltr"><span style="text-align:left;font-size:13pt;font-style:normal;font-weight:normal;color:#000000;font-family:Arial">test</span></p></body>)
/T (thw)
/Contents (test)
/Rect [115.975 693.768 179.508 827.883]
/CA 1
/P 3 0 R
/M (D:20180107100342+01'00')
/Type /Annot
/NM (fd33d765-e844-4226-aff8-3ef81361e787)
/F 4
/BS
<<
/W 2
/S /S
>>
/AP
<<
/N 6 0 R
>>
/C [1 0 0]
/Popup 5 0 R
/Subj (Rectangle)
/CreationDate (D:20180107100338+01'00')
>>
endobj
8 0 obj
<<
/OPM 1
/Type /ExtGState
>>
endobj
7 0 obj
<<
/R7 8 0 R
>>
endobj
9 0 obj
<<
/Length 30
>>
stream
q 0.1 0 0 0.1 0 0 cm
/R7 gs
Q
endstream
endobj
3 0 obj
<<
/pdftk_PageNum 1
/Annots [4 0 R 5 0 R]
/Resources
<<
/ProcSet [/PDF]
/ExtGState 7 0 R
>>
/Type /Page
/Parent 1 0 R
/Contents 9 0 R
/MediaBox [0 0 595 842]
>>
endobj
1 0 obj
<<
/Kids [3 0 R]
/Type /Pages
/Count 1
>>
endobj
11 0 obj
<<
/Type /Catalog
/Pages 1 0 R
>>
endobj
12 0 obj
<<
/ModDate (D:20180107101601+01'00')
/CreationDate (D:20180107101601+01'00')
/Creator (pdftk 2.02 - www.pdftk.com)
/Producer (itext-paulo-155 \(itextpdf.sf.net-lowagie.com\))
>>
endobj xref
0 13
0000000000 65535 f
0000001472 00000 n
0000000000 65535 f
0000001293 00000 n
0000000460 00000 n
0000000015 00000 n
0000000220 00000 n
0000001177 00000 n
0000001130 00000 n
0000001210 00000 n
0000000000 65535 f
0000001531 00000 n
0000001583 00000 n
trailer
<<
/Info 12 0 R
/ID [<23bde7d1ea6b4f52b55dc534b36f8d41><e031fe688c87cb2303e0a99487c3025e>]
/Root 11 0 R
/Size 13
>>
startxref
1779
%%EOF
output.pdf
%PDF-1.7
%âãÏÓ
5 0 obj
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Type /Annot
/Parent 4 0 R
/Open false
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj
6 0 obj
<<
/FormType 1
/Subtype /Form
/Type /XObject
/BBox [115.975 693.768 179.508 827.883]
/Length 69
/Matrix [1 0 0 1 -115.975 -693.768]
>>
stream
1.000 0.000 0.000 RG
2 w
0 J
0 j
116.975 694.768 61.534 132.115 re
S
endstream
endobj
4 0 obj
<<
/Subtype /Square
/RD [0 0 0 0]
/RC (<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:11.0.0" xfa:spec="2.0.2"><p dir="ltr"><span style="text-align:left;font-size:13pt;font-style:normal;font-weight:normal;color:#000000;font-family:Arial">test</span></p></body>)
/T (thw)
/Contents (test)
/Rect [115.975 693.768 179.508 827.883]
/CA 1
/P 3 0 R
/M (D:20180107100342+01'00')
/Type /Annot
/NM (fd33d765-e844-4226-aff8-3ef81361e787)
/F 4
/BS
<<
/W 2
/S /S
>>
/AP
<<
/N 6 0 R
>>
/C [1 0 0]
/Popup 5 0 R
/Subj (Rectangle)
/CreationDate (D:20180107100338+01'00')
>>
endobj
7 0 obj
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Open false
/Type /Annot
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj
9 0 obj
<<
/OPM 1
/Type /ExtGState
>>
endobj
8 0 obj
<<
/R7 9 0 R
>>
endobj
10 0 obj
<<
/Length 30
>>
stream
q 0.1 0 0 0.1 0 0 cm
/R7 gs
Q
endstream
endobj
3 0 obj
<<
/pdftk_PageNum 1
/Annots [4 0 R 7 0 R]
/Resources
<<
/ProcSet [/PDF]
/ExtGState 8 0 R
>>
/Contents 10 0 R
/Parent 1 0 R
/Type /Page
/MediaBox [0 0 595 842]
>>
endobj
1 0 obj
<<
/Kids [3 0 R]
/Type /Pages
/Count 1
>>
endobj
12 0 obj
<<
/Type /Catalog
/Pages 1 0 R
>>
endobj
13 0 obj
<<
/ModDate (D:20180107101427+01'00')
/CreationDate (D:20180107101427+01'00')
/Creator (pdftk 2.02 - www.pdftk.com)
/Producer (itext-paulo-155 \(itextpdf.sf.net-lowagie.com\))
>>
endobj xref
0 14
0000000000 65535 f
0000001665 00000 n
0000000000 65535 f
0000001485 00000 n
0000000460 00000 n
0000000015 00000 n
0000000220 00000 n
0000001130 00000 n
0000001368 00000 n
0000001321 00000 n
0000001401 00000 n
0000000000 65535 f
0000001724 00000 n
0000001776 00000 n
trailer
<<
/Info 13 0 R
/ID [<09baf689039bb6015d4c428111e4ee72><684b5613b1931e88255384276dcaceb1>]
/Root 12 0 R
/Size 14
>>
startxref
1972
%%EOF
关于为什么会这样以及如何使 iText 可以访问父项的任何提示?
不幸的是OP没有提供二进制形式的PDF文件,所以我不能简单地检查以下内容;但是,查看数据,差异是显而易见的...
您 input.pdf
中的弹出对象有一个 Parent 条目:
5 0 obj
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Type /Annot
/Parent 4 0 R
/Open false
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj
另一方面,在您的 output.pdf
中,弹出对象没有:
7 0 obj
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Open false
/Type /Annot
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
这也匹配 getParent
方法的 iText 7 代码:
public PdfDictionary getParentObject() {
return getPdfObject().getAsDictionary(PdfName.Parent);
}
public PdfAnnotation getParent() {
if (parent == null) {
parent = makeAnnotation(getParentObject());
}
return parent;
}
因此,要使 iText 可以访问父项,请确保弹出注释具有 Parent 条目!
是的,我知道,Parent 条目是可选的。但是 getParent
并不声称它确定实际的父对象,它只是 returns Parent 条目引用的对象。
您的 output.pdf 中的另一个问题:
- 页面对象清楚地表明其注释在对象 4 和 7 中;
- 但是 4 中的注释引用了 5 中的注释作为弹出窗口,该注释甚至与页面没有关联;
- 5 中的弹出窗口(与页面无关的那个)引用 4 作为其父级; 7 中的弹出窗口(与页面关联的那个)没有 Parent 条目。
在分析文件时,您可能没有查看页面的注释,而只是查看注释对象之间的 pop-up/parent 关系,因此认为您的弹出窗口有一个父条目...
我正在将带注释的 PDF 页面从一个文档复制到另一个文档。我遇到的奇怪事情是,在新文档中,我无法访问 PdfPopupAnnotation
s:
public class CopyPdfTest {
public static void main(String[] args) throws IOException {
PdfDocument inputDoc = new PdfDocument(new PdfReader("src/test/resources/input.pdf"));
PdfDocument outputDoc = new PdfDocument(new PdfWriter("/tmp/output.pdf"));
// Copy pages
for (int i = 1; i <= inputDoc.getNumberOfPages(); i++) {
inputDoc.copyPagesTo(i, i, outputDoc);
}
// Re-open outputDoc to eliminate the possibility the problem stems from
// it being opened in writing mode
outputDoc.close();
outputDoc = new PdfDocument(new PdfReader("/tmp/output.pdf"));
// Step through the PdfPopupAnnotations in both documents and check for their parents
for (PdfDocument doc : new PdfDocument[] { inputDoc, outputDoc } ) {
for (int i = 1; i <= inputDoc.getNumberOfPages(); i++) {
for (PdfAnnotation annot : doc.getPage(i).getAnnotations()) {
if (annot instanceof PdfPopupAnnotation) {
// This prints null for popups from the outputDoc
System.out.println(((PdfPopupAnnotation) annot).getParentObject());
}
}
}
}
}
}
当处理带有一个 /Square
注释的 PDF 时,这会导致以下输出(第一行打印原始 PDF 的弹出式注释父级,第二行打印 null
用于输出 PDF):
<</AP <</N 10 0 R >> /C [0.898026 0.133331 0.215683 ] /Contents test /CreationDate D:20180107105025+01'00' /F 4 /M D:20180107105029+01'00' /NM 8a233cc7-ed2f-48bf-91f2-a46cecf15160 /P 9 0 R /Popup 16 0 R /RC <?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:18.9.0" xfa:spec="2.0.2" ><p dir="ltr"><span dir="ltr" style="font-size:10.5pt;text-align:left;color:#000000;font-weight:normal;font-style:normal">test</span></p></body> /RD [0.5 0.5 0.5 0.5 ] /Rect [84.7495 636.205 191.876 764.21 ] /Subj Rectangle /Subtype /Square /T tom /Type /Annot >>
null
我发现这在查看未压缩的示例 PDF 时特别奇怪,父引用 4 0 R
保持不变,引用的 /Square
注释仍然存在 4 0 obj
.
input.pdf
%PDF-1.4
%âãÏÓ
5 0 obj
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Type /Annot
/Parent 4 0 R
/Open false
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj
6 0 obj
<<
/FormType 1
/Subtype /Form
/Type /XObject
/BBox [115.975 693.768 179.508 827.883]
/Length 69
/Matrix [1 0 0 1 -115.975 -693.768]
>>
stream
1.000 0.000 0.000 RG
2 w
0 J
0 j
116.975 694.768 61.534 132.115 re
S
endstream
endobj
4 0 obj
<<
/Subtype /Square
/RD [0 0 0 0]
/RC (<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:11.0.0" xfa:spec="2.0.2"><p dir="ltr"><span style="text-align:left;font-size:13pt;font-style:normal;font-weight:normal;color:#000000;font-family:Arial">test</span></p></body>)
/T (thw)
/Contents (test)
/Rect [115.975 693.768 179.508 827.883]
/CA 1
/P 3 0 R
/M (D:20180107100342+01'00')
/Type /Annot
/NM (fd33d765-e844-4226-aff8-3ef81361e787)
/F 4
/BS
<<
/W 2
/S /S
>>
/AP
<<
/N 6 0 R
>>
/C [1 0 0]
/Popup 5 0 R
/Subj (Rectangle)
/CreationDate (D:20180107100338+01'00')
>>
endobj
8 0 obj
<<
/OPM 1
/Type /ExtGState
>>
endobj
7 0 obj
<<
/R7 8 0 R
>>
endobj
9 0 obj
<<
/Length 30
>>
stream
q 0.1 0 0 0.1 0 0 cm
/R7 gs
Q
endstream
endobj
3 0 obj
<<
/pdftk_PageNum 1
/Annots [4 0 R 5 0 R]
/Resources
<<
/ProcSet [/PDF]
/ExtGState 7 0 R
>>
/Type /Page
/Parent 1 0 R
/Contents 9 0 R
/MediaBox [0 0 595 842]
>>
endobj
1 0 obj
<<
/Kids [3 0 R]
/Type /Pages
/Count 1
>>
endobj
11 0 obj
<<
/Type /Catalog
/Pages 1 0 R
>>
endobj
12 0 obj
<<
/ModDate (D:20180107101601+01'00')
/CreationDate (D:20180107101601+01'00')
/Creator (pdftk 2.02 - www.pdftk.com)
/Producer (itext-paulo-155 \(itextpdf.sf.net-lowagie.com\))
>>
endobj xref
0 13
0000000000 65535 f
0000001472 00000 n
0000000000 65535 f
0000001293 00000 n
0000000460 00000 n
0000000015 00000 n
0000000220 00000 n
0000001177 00000 n
0000001130 00000 n
0000001210 00000 n
0000000000 65535 f
0000001531 00000 n
0000001583 00000 n
trailer
<<
/Info 12 0 R
/ID [<23bde7d1ea6b4f52b55dc534b36f8d41><e031fe688c87cb2303e0a99487c3025e>]
/Root 11 0 R
/Size 13
>>
startxref
1779
%%EOF
output.pdf
%PDF-1.7
%âãÏÓ
5 0 obj
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Type /Annot
/Parent 4 0 R
/Open false
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj
6 0 obj
<<
/FormType 1
/Subtype /Form
/Type /XObject
/BBox [115.975 693.768 179.508 827.883]
/Length 69
/Matrix [1 0 0 1 -115.975 -693.768]
>>
stream
1.000 0.000 0.000 RG
2 w
0 J
0 j
116.975 694.768 61.534 132.115 re
S
endstream
endobj
4 0 obj
<<
/Subtype /Square
/RD [0 0 0 0]
/RC (<?xml version="1.0"?><body xmlns="http://www.w3.org/1999/xhtml" xmlns:xfa="http://www.xfa.org/schema/xfa-data/1.0/" xfa:APIVersion="Acrobat:11.0.0" xfa:spec="2.0.2"><p dir="ltr"><span style="text-align:left;font-size:13pt;font-style:normal;font-weight:normal;color:#000000;font-family:Arial">test</span></p></body>)
/T (thw)
/Contents (test)
/Rect [115.975 693.768 179.508 827.883]
/CA 1
/P 3 0 R
/M (D:20180107100342+01'00')
/Type /Annot
/NM (fd33d765-e844-4226-aff8-3ef81361e787)
/F 4
/BS
<<
/W 2
/S /S
>>
/AP
<<
/N 6 0 R
>>
/C [1 0 0]
/Popup 5 0 R
/Subj (Rectangle)
/CreationDate (D:20180107100338+01'00')
>>
endobj
7 0 obj
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Open false
/Type /Annot
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj
9 0 obj
<<
/OPM 1
/Type /ExtGState
>>
endobj
8 0 obj
<<
/R7 9 0 R
>>
endobj
10 0 obj
<<
/Length 30
>>
stream
q 0.1 0 0 0.1 0 0 cm
/R7 gs
Q
endstream
endobj
3 0 obj
<<
/pdftk_PageNum 1
/Annots [4 0 R 7 0 R]
/Resources
<<
/ProcSet [/PDF]
/ExtGState 8 0 R
>>
/Contents 10 0 R
/Parent 1 0 R
/Type /Page
/MediaBox [0 0 595 842]
>>
endobj
1 0 obj
<<
/Kids [3 0 R]
/Type /Pages
/Count 1
>>
endobj
12 0 obj
<<
/Type /Catalog
/Pages 1 0 R
>>
endobj
13 0 obj
<<
/ModDate (D:20180107101427+01'00')
/CreationDate (D:20180107101427+01'00')
/Creator (pdftk 2.02 - www.pdftk.com)
/Producer (itext-paulo-155 \(itextpdf.sf.net-lowagie.com\))
>>
endobj xref
0 14
0000000000 65535 f
0000001665 00000 n
0000000000 65535 f
0000001485 00000 n
0000000460 00000 n
0000000015 00000 n
0000000220 00000 n
0000001130 00000 n
0000001368 00000 n
0000001321 00000 n
0000001401 00000 n
0000000000 65535 f
0000001724 00000 n
0000001776 00000 n
trailer
<<
/Info 13 0 R
/ID [<09baf689039bb6015d4c428111e4ee72><684b5613b1931e88255384276dcaceb1>]
/Root 12 0 R
/Size 14
>>
startxref
1972
%%EOF
关于为什么会这样以及如何使 iText 可以访问父项的任何提示?
不幸的是OP没有提供二进制形式的PDF文件,所以我不能简单地检查以下内容;但是,查看数据,差异是显而易见的...
您 input.pdf
中的弹出对象有一个 Parent 条目:
5 0 obj
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Type /Annot
/Parent 4 0 R
/Open false
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
endobj
另一方面,在您的 output.pdf
中,弹出对象没有:
7 0 obj
<<
/M (D:20180107100338+01'00')
/NM (68bd5c7e-3071-4b10-83ad-6bb2e75a8f3d)
/Subtype /Popup
/Open false
/Type /Annot
/F 28
/Rect [352.966 707.883 532.966 827.883]
/P 3 0 R
>>
这也匹配 getParent
方法的 iText 7 代码:
public PdfDictionary getParentObject() {
return getPdfObject().getAsDictionary(PdfName.Parent);
}
public PdfAnnotation getParent() {
if (parent == null) {
parent = makeAnnotation(getParentObject());
}
return parent;
}
因此,要使 iText 可以访问父项,请确保弹出注释具有 Parent 条目!
是的,我知道,Parent 条目是可选的。但是 getParent
并不声称它确定实际的父对象,它只是 returns Parent 条目引用的对象。
您的 output.pdf 中的另一个问题:
- 页面对象清楚地表明其注释在对象 4 和 7 中;
- 但是 4 中的注释引用了 5 中的注释作为弹出窗口,该注释甚至与页面没有关联;
- 5 中的弹出窗口(与页面无关的那个)引用 4 作为其父级; 7 中的弹出窗口(与页面关联的那个)没有 Parent 条目。
在分析文件时,您可能没有查看页面的注释,而只是查看注释对象之间的 pop-up/parent 关系,因此认为您的弹出窗口有一个父条目...