如何使用 PDFBox 2 在分页符附近没有空格的情况下密集合并 PDF 文件?
How to dense merge PDF files using PDFBox 2 without whitespace near page breaks?
我们一直在使用基于 iText 的 PdfVeryDenseMergeTool we found in this SO question 将多个 PDF 文件合并为一个 PDF 文件。该工具合并 PDF 时不会在中间留有任何空白,而且如果可能,单个 PDF 也会跨页拆分。
我们想要移植 PdfVeryDenseMergeTool to PDFBox. We found a PDFBox 2 based PdfDenseMergeTool 合并 PDF,如下所示:
个人 PDF:
密集合并 PDF:
我们正在寻找这样的东西(这已经是基于 iText 的 PdfVeryDenseMergeTool 但我们想使用 PDFBox 2 来实现):
在我们尝试进行移植时,我们发现 PdfVeryDenseMergeTool 使用 PageVerticalAnalyzer 扩展 iText PDF 渲染监听器并且每次在PDF。然后使用所有呈现信息将单个 PDF 拆分到多个页面。我们尝试在 PDFBox 2 中寻找类似的 PDF Render Listener,但发现可用的 PDFRenderer class 只有图像渲染方法。所以我们不确定如何将 PageVerticalAnalyzer 移植到 PDFBox。
如果有人可以提出前进的方法,我们将非常感谢他们的帮助。
非常感谢!
编辑 2020 年 2 月 7 日
目前,我们正在从 PDFBox 扩展 PDFGraphicsStreamEngine 来制作一个自定义渲染引擎来跟踪图像、文本行和绘制时的弧线。该自定义引擎将是 PageVerticalAnalyzer 的端口。之后,我们希望能够将 PdfVeryDenseMergeTool 移植到 PDFBox.
编辑 2020 年 2 月 8 日
这是一个非常简单的 PageVerticalAnalyzer 端口,可以处理图像和文本。我是 PDFBox 新手,所以我处理图像的逻辑可能很奇怪。这是基本方法:
Text:对于打印的每个字形,获取 bottomY 并使 topY = bottomY + charHeight,标记那些 top/bottom 点。
Image:每次调用 drawImage() 时,看起来有两种方法可以找出绘制位置。第一个是使用最后一次调用 appendRectangle() 的坐标,第二个是使用最后一次调用 moveTo()、multiple lineTo() 和 closePath()。我优先考虑后者。如果我找不到任何路径(我在一个 PDF 中找到它,在另一个 PDF 中,在 drawImage() 之前,我只找到了 appendRectangle()),我使用前者。如果 none 存在,我不知道该怎么做。这是我假设 PDFBox 使用 moveTo()/lineTo()/closePath() 标记图像坐标的方式:
这是我当前的实现:
import java.awt.geom.Point2D;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.contentstream.PDFGraphicsStreamEngine;
import org.apache.pdfbox.cos.COSArray;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.graphics.image.PDImage;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation;
import org.apache.pdfbox.util.Matrix;
import org.apache.pdfbox.util.Vector;
public class PageVerticalAnalyzer extends PDFGraphicsStreamEngine
{
/**
* This is a port of iText based PageVerticalAnalyzer found here
* https://github.com/mkl-public/testarea-itext5/blob/master/src/main/java/mkl/testarea/itext5/merge/PageVerticalAnalyzer.java
*
* @param page PDF Page
*/
protected PageVerticalAnalyzer(PDPage page)
{
super(page);
}
public static void main(String[] args) throws IOException
{
File file = new File("q2.pdf");
try (PDDocument doc = PDDocument.load(file))
{
PDPage page = doc.getPage(0);
PageVerticalAnalyzer engine = new PageVerticalAnalyzer(page);
engine.run();
System.out.println(engine.verticalFlips);
}
}
/**
* Runs the engine on the current page.
*
* @throws IOException If there is an IO error while drawing the page.
*/
public void run() throws IOException
{
processPage(getPage());
for (PDAnnotation annotation : getPage().getAnnotations())
{
showAnnotation(annotation);
}
}
// All path related stuff
@Override
public void clip(int windingRule) throws IOException
{
System.out.println("clip");
}
@Override
public void moveTo(float x, float y) throws IOException
{
System.out.printf("moveTo %.2f %.2f%n", x, y);
lastPathBottomTop = new float[] {(Float) null, y};
}
@Override
public void lineTo(float x, float y) throws IOException
{
System.out.printf("lineTo %.2f %.2f%n", x, y);
lastLineTo = new float[] {x, y};
}
@Override
public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException
{
System.out.printf("curveTo %.2f %.2f, %.2f %.2f, %.2f %.2f%n", x1, y1, x2, y2, x3, y3);
}
@Override
public Point2D getCurrentPoint() throws IOException
{
// if you want to build paths, you'll need to keep track of this like PageDrawer does
return new Point2D.Float(0, 0);
}
@Override
public void closePath() throws IOException
{
System.out.println("closePath");
lastPathBottomTop[0] = lastLineTo[1];
lastLineTo = null;
}
@Override
public void endPath() throws IOException
{
System.out.println("endPath");
}
@Override
public void strokePath() throws IOException
{
System.out.println("strokePath");
}
@Override
public void fillPath(int windingRule) throws IOException
{
System.out.println("fillPath");
}
@Override
public void fillAndStrokePath(int windingRule) throws IOException
{
System.out.println("fillAndStrokePath");
}
@Override
public void shadingFill(COSName shadingName) throws IOException
{
System.out.println("shadingFill " + shadingName.toString());
}
// Rectangle related stuff
@Override
public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException
{
System.out.printf("appendRectangle %.2f %.2f, %.2f %.2f, %.2f %.2f, %.2f %.2f%n",
p0.getX(), p0.getY(), p1.getX(), p1.getY(),
p2.getX(), p2.getY(), p3.getX(), p3.getY());
lastRectBottomTop = new float[] {(float) p0.getY(), (float) p3.getY()};
}
// Image drawing
@Override
public void drawImage(PDImage pdImage) throws IOException
{
System.out.println("drawImage");
if (lastPathBottomTop != null) {
addVerticalUseSection(lastPathBottomTop[0], lastPathBottomTop[1]);
} else if (lastRectBottomTop != null ){
addVerticalUseSection(lastRectBottomTop[0], lastRectBottomTop[1]);
} else {
throw new Error("Drawing image without last reference!");
}
lastPathBottomTop = null;
lastRectBottomTop = null;
}
// All text related stuff
@Override
public void showTextString(byte[] string) throws IOException
{
System.out.print("showTextString \"");
super.showTextString(string);
System.out.println("\"");
}
@Override
public void showTextStrings(COSArray array) throws IOException
{
System.out.print("showTextStrings \"");
super.showTextStrings(array);
System.out.println("\"");
}
@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode,
Vector displacement) throws IOException
{
// print the actual character that is being rendered
System.out.print(unicode);
super.showGlyph(textRenderingMatrix, font, code, unicode, displacement);
// rendering matrix seems to contain bounding box of dimensions the char
// and an x/y point where bounding box starts
//System.out.println(textRenderingMatrix.toString());
// y of the bottom of the char
// not sure why the y value is in the 8th column
// when I print the matrix, it shows up in the 6th column
float yBottom = textRenderingMatrix.getValue(0, 7);
// height of the char
// using the value in the first column as the char height
float yTop = yBottom + textRenderingMatrix.getValue(0, 0);
addVerticalUseSection(yBottom, yTop);
}
// Keeping track of bottom/top point pairs
void addVerticalUseSection(float from, float to)
{
if (to < from)
{
float temp = to;
to = from;
from = temp;
}
int i=0, j=0;
for (; i<verticalFlips.size(); i++)
{
float flip = verticalFlips.get(i);
if (flip < from)
continue;
for (j=i; j<verticalFlips.size(); j++)
{
flip = verticalFlips.get(j);
if (flip < to)
continue;
break;
}
break;
}
boolean fromOutsideInterval = i%2==0;
boolean toOutsideInterval = j%2==0;
while (j-- > i)
verticalFlips.remove(j);
if (toOutsideInterval)
verticalFlips.add(i, to);
if (fromOutsideInterval)
verticalFlips.add(i, from);
}
final List<Float> verticalFlips = new ArrayList<Float>();
private float[] lastRectBottomTop;
private float[] lastPathBottomTop;
private float[] lastLineTo;
}
我正在寻找以下问题的答案:
- 如何改进此实现?
- 曲线等其他我没有处理过的怎么处理?
此答案与原始 iText 版本存在相同的问题。
PageVerticalAnalyzer
的端口
可以按如下方式将 PageVerticalAnalyzer
从 iText 移植到 PDFBox:
public class PageVerticalAnalyzer extends PDFGraphicsStreamEngine {
protected PageVerticalAnalyzer(PDPage page) {
super(page);
}
public List<Float> getVerticalFlips() {
return verticalFlips;
}
//
// Text
//
@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement)
throws IOException {
super.showGlyph(textRenderingMatrix, font, code, unicode, displacement);
Shape shape = calculateGlyphBounds(textRenderingMatrix, font, code);
if (shape != null) {
Rectangle2D rect = shape.getBounds2D();
addVerticalUseSection(rect.getMinY(), rect.getMaxY());
}
}
/**
* Copy of <code>org.apache.pdfbox.examples.util.DrawPrintTextLocations.calculateGlyphBounds(Matrix, PDFont, int)</code>.
*/
private Shape calculateGlyphBounds(Matrix textRenderingMatrix, PDFont font, int code) throws IOException
{
GeneralPath path = null;
AffineTransform at = textRenderingMatrix.createAffineTransform();
at.concatenate(font.getFontMatrix().createAffineTransform());
if (font instanceof PDType3Font)
{
// It is difficult to calculate the real individual glyph bounds for type 3 fonts
// because these are not vector fonts, the content stream could contain almost anything
// that is found in page content streams.
PDType3Font t3Font = (PDType3Font) font;
PDType3CharProc charProc = t3Font.getCharProc(code);
if (charProc != null)
{
BoundingBox fontBBox = t3Font.getBoundingBox();
PDRectangle glyphBBox = charProc.getGlyphBBox();
if (glyphBBox != null)
{
// PDFBOX-3850: glyph bbox could be larger than the font bbox
glyphBBox.setLowerLeftX(Math.max(fontBBox.getLowerLeftX(), glyphBBox.getLowerLeftX()));
glyphBBox.setLowerLeftY(Math.max(fontBBox.getLowerLeftY(), glyphBBox.getLowerLeftY()));
glyphBBox.setUpperRightX(Math.min(fontBBox.getUpperRightX(), glyphBBox.getUpperRightX()));
glyphBBox.setUpperRightY(Math.min(fontBBox.getUpperRightY(), glyphBBox.getUpperRightY()));
path = glyphBBox.toGeneralPath();
}
}
}
else if (font instanceof PDVectorFont)
{
PDVectorFont vectorFont = (PDVectorFont) font;
path = vectorFont.getPath(code);
if (font instanceof PDTrueTypeFont)
{
PDTrueTypeFont ttFont = (PDTrueTypeFont) font;
int unitsPerEm = ttFont.getTrueTypeFont().getHeader().getUnitsPerEm();
at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
if (font instanceof PDType0Font)
{
PDType0Font t0font = (PDType0Font) font;
if (t0font.getDescendantFont() instanceof PDCIDFontType2)
{
int unitsPerEm = ((PDCIDFontType2) t0font.getDescendantFont()).getTrueTypeFont().getHeader().getUnitsPerEm();
at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
}
}
else if (font instanceof PDSimpleFont)
{
PDSimpleFont simpleFont = (PDSimpleFont) font;
// these two lines do not always work, e.g. for the TT fonts in file 032431.pdf
// which is why PDVectorFont is tried first.
String name = simpleFont.getEncoding().getName(code);
path = simpleFont.getPath(name);
}
else
{
// shouldn't happen, please open issue in JIRA
System.out.println("Unknown font class: " + font.getClass());
}
if (path == null)
{
return null;
}
return at.createTransformedShape(path.getBounds2D());
}
//
// Bitmaps
//
@Override
public void drawImage(PDImage pdImage) throws IOException {
Matrix ctm = getGraphicsState().getCurrentTransformationMatrix();
Section section = null;
for (int x = 0; x < 2; x++) {
for (int y = 0; y < 2; y++) {
Point2D.Float point = ctm.transformPoint(x, y);
if (section == null)
section = new Section(point.y);
else
section.extendTo(point.y);
}
}
addVerticalUseSection(section.from, section.to);
}
//
// Paths
//
@Override
public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException {
subPath = null;
Section section = new Section(p0.getY());
section.extendTo(p1.getY()).extendTo(p2.getY()).extendTo(p3.getY());
currentPoint = p0;
}
@Override
public void clip(int windingRule) throws IOException {
}
@Override
public void moveTo(float x, float y) throws IOException {
subPath = new Section(y);
path.add(subPath);
currentPoint = new Point2D.Float(x, y);
}
@Override
public void lineTo(float x, float y) throws IOException {
if (subPath == null) {
subPath = new Section(y);
path.add(subPath);
} else
subPath.extendTo(y);
currentPoint = new Point2D.Float(x, y);
}
/**
* Beware! This is incorrect! The control points may be outside
* the vertically used range
*/
@Override
public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException {
if (subPath == null) {
subPath = new Section(y1);
path.add(subPath);
} else
subPath.extendTo(y1);
subPath.extendTo(y2).extendTo(y3);
currentPoint = new Point2D.Float(x3, y3);
}
@Override
public Point2D getCurrentPoint() throws IOException {
return currentPoint;
}
@Override
public void closePath() throws IOException {
}
@Override
public void endPath() throws IOException {
path.clear();
subPath = null;
}
@Override
public void strokePath() throws IOException {
for (Section section : path) {
addVerticalUseSection(section.from, section.to);
}
path.clear();
subPath = null;
}
@Override
public void fillPath(int windingRule) throws IOException {
for (Section section : path) {
addVerticalUseSection(section.from, section.to);
}
path.clear();
subPath = null;
}
@Override
public void fillAndStrokePath(int windingRule) throws IOException {
for (Section section : path) {
addVerticalUseSection(section.from, section.to);
}
path.clear();
subPath = null;
}
@Override
public void shadingFill(COSName shadingName) throws IOException {
// TODO Auto-generated method stub
}
Point2D currentPoint = null;
List<Section> path = new ArrayList<Section>();
Section subPath = null;
static class Section {
Section(double value) {
this((float)value);
}
Section(float value) {
from = value;
to = value;
}
Section extendTo(double value) {
return extendTo((float)value);
}
Section extendTo(float value) {
if (value < from)
from = value;
else if (value > to)
to = value;
return this;
}
private float from;
private float to;
}
void addVerticalUseSection(double from, double to) {
addVerticalUseSection((float)from, (float)to);
}
void addVerticalUseSection(float from, float to) {
if (to < from) {
float temp = to;
to = from;
from = temp;
}
int i=0, j=0;
for (; i<verticalFlips.size(); i++) {
float flip = verticalFlips.get(i);
if (flip < from)
continue;
for (j=i; j<verticalFlips.size(); j++) {
flip = verticalFlips.get(j);
if (flip < to)
continue;
break;
}
break;
}
boolean fromOutsideInterval = i%2==0;
boolean toOutsideInterval = j%2==0;
while (j-- > i)
verticalFlips.remove(j);
if (toOutsideInterval)
verticalFlips.add(i, to);
if (fromOutsideInterval)
verticalFlips.add(i, from);
}
final List<Float> verticalFlips = new ArrayList<Float>();
}
实现实际上与 BoundingBoxFinder
from 的实现类似。就像我从 PDFBox 示例中借用的那样 DrawPrintTextLocations
来确定文本轮廓。
此外,与原始iText5PageVerticalAnalyzer
from 相对应的curveTo
处理中存在问题,控制点被视为在实际曲线上,但实际上它们通常是不能并且可以远远超出曲线的垂直使用范围。可以使用相应的 AWT classes 代替此处实现的路径处理,但这在 Android 等
上可能是不可能的
就像那里一样class忽略了注释,但是iText5密集合并也忽略了注释。而这个class也忽略了剪辑路径...
PdfVeryDenseMergeTool
的端口
public class PdfVeryDenseMergeTool {
public PdfVeryDenseMergeTool(PDRectangle size, float top, float bottom, float gap)
{
this.pageSize = size;
this.topMargin = top;
this.bottomMargin = bottom;
this.gap = gap;
}
public void merge(OutputStream outputStream, Iterable<PDDocument> inputs) throws IOException
{
try
{
openDocument();
for (PDDocument input: inputs)
{
merge(input);
}
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
document.save(outputStream);
}
finally
{
closeDocument();
}
}
void openDocument() throws IOException
{
document = new PDDocument();
newPage();
}
void closeDocument() throws IOException
{
try
{
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
document.close();
}
finally
{
this.document = null;
this.yPosition = 0;
}
}
void newPage() throws IOException
{
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
currentPage = new PDPage(pageSize);
document.addPage(currentPage);
yPosition = pageSize.getUpperRightY() - topMargin;
currentContents = new PDPageContentStream(document, currentPage);
}
void merge(PDDocument input) throws IOException
{
for (PDPage page : input.getPages())
{
merge(input, page);
}
}
void merge(PDDocument sourceDoc, PDPage page) throws IOException
{
PDRectangle pageSizeToImport = page.getCropBox();
PageVerticalAnalyzer analyzer = new PageVerticalAnalyzer(page);
analyzer.processPage(page);
List<Float> verticalFlips = analyzer.getVerticalFlips();
if (verticalFlips.size() < 2)
return;
LayerUtility layerUtility = new LayerUtility(document);
PDFormXObject form = layerUtility.importPageAsForm(sourceDoc, page);
int startFlip = verticalFlips.size() - 1;
boolean first = true;
while (startFlip > 0)
{
if (!first)
newPage();
float freeSpace = yPosition - pageSize.getLowerLeftY() - bottomMargin;
int endFlip = startFlip + 1;
while ((endFlip > 1) && (verticalFlips.get(startFlip) - verticalFlips.get(endFlip - 2) < freeSpace))
endFlip -=2;
if (endFlip < startFlip)
{
float height = verticalFlips.get(startFlip) - verticalFlips.get(endFlip);
currentContents.saveGraphicsState();
currentContents.addRect(0, yPosition - height, pageSizeToImport.getWidth(), height);
currentContents.clip();
Matrix matrix = Matrix.getTranslateInstance(0, (float)(yPosition - (verticalFlips.get(startFlip) - pageSizeToImport.getLowerLeftY())));
currentContents.transform(matrix);
currentContents.drawForm(form);
currentContents.restoreGraphicsState();
yPosition -= height + gap;
startFlip = endFlip - 1;
}
else if (!first)
throw new IllegalArgumentException(String.format("Page %s content sections too large.", page));
first = false;
}
}
PDDocument document = null;
PDPage currentPage = null;
PDPageContentStream currentContents = null;
float yPosition = 0;
final PDRectangle pageSize;
final float topMargin;
final float bottomMargin;
final float gap;
}
这本质上是 iText 5 的一个简单端口 PdfVeryDenseMergeTool
,没有什么特别之处。
PdfVeryDenseMergeTool
的用法
只需创建一个带有格式信息的 PdfVeryDenseMergeTool
实例,然后使用 PDDocument
个实例作为源开始合并:
PDDocument document1 = ...;
...
PDDocument documentN = ...;
PdfVeryDenseMergeTool tool = new PdfVeryDenseMergeTool(PDRectangle.A4, 30, 30, 10);
tool.merge(new FileOutputStream(RESULT_FILE), Arrays.asList(document1, ..., documentN));
(DenseMerging 测试 testVeryDenseMerging
)
我们一直在使用基于 iText 的 PdfVeryDenseMergeTool we found in this SO question
我们想要移植 PdfVeryDenseMergeTool to PDFBox. We found a PDFBox 2 based PdfDenseMergeTool 合并 PDF,如下所示:
个人 PDF:
密集合并 PDF:
我们正在寻找这样的东西(这已经是基于 iText 的 PdfVeryDenseMergeTool 但我们想使用 PDFBox 2 来实现):
在我们尝试进行移植时,我们发现 PdfVeryDenseMergeTool 使用 PageVerticalAnalyzer 扩展 iText PDF 渲染监听器并且每次在PDF。然后使用所有呈现信息将单个 PDF 拆分到多个页面。我们尝试在 PDFBox 2 中寻找类似的 PDF Render Listener,但发现可用的 PDFRenderer class 只有图像渲染方法。所以我们不确定如何将 PageVerticalAnalyzer 移植到 PDFBox。
如果有人可以提出前进的方法,我们将非常感谢他们的帮助。
非常感谢!
编辑 2020 年 2 月 7 日
目前,我们正在从 PDFBox 扩展 PDFGraphicsStreamEngine 来制作一个自定义渲染引擎来跟踪图像、文本行和绘制时的弧线。该自定义引擎将是 PageVerticalAnalyzer 的端口。之后,我们希望能够将 PdfVeryDenseMergeTool 移植到 PDFBox.
编辑 2020 年 2 月 8 日
这是一个非常简单的 PageVerticalAnalyzer 端口,可以处理图像和文本。我是 PDFBox 新手,所以我处理图像的逻辑可能很奇怪。这是基本方法:
Text:对于打印的每个字形,获取 bottomY 并使 topY = bottomY + charHeight,标记那些 top/bottom 点。
Image:每次调用 drawImage() 时,看起来有两种方法可以找出绘制位置。第一个是使用最后一次调用 appendRectangle() 的坐标,第二个是使用最后一次调用 moveTo()、multiple lineTo() 和 closePath()。我优先考虑后者。如果我找不到任何路径(我在一个 PDF 中找到它,在另一个 PDF 中,在 drawImage() 之前,我只找到了 appendRectangle()),我使用前者。如果 none 存在,我不知道该怎么做。这是我假设 PDFBox 使用 moveTo()/lineTo()/closePath() 标记图像坐标的方式:
这是我当前的实现:
import java.awt.geom.Point2D;
import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;
import org.apache.pdfbox.contentstream.PDFGraphicsStreamEngine;
import org.apache.pdfbox.cos.COSArray;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.font.PDFont;
import org.apache.pdfbox.pdmodel.graphics.image.PDImage;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation;
import org.apache.pdfbox.util.Matrix;
import org.apache.pdfbox.util.Vector;
public class PageVerticalAnalyzer extends PDFGraphicsStreamEngine
{
/**
* This is a port of iText based PageVerticalAnalyzer found here
* https://github.com/mkl-public/testarea-itext5/blob/master/src/main/java/mkl/testarea/itext5/merge/PageVerticalAnalyzer.java
*
* @param page PDF Page
*/
protected PageVerticalAnalyzer(PDPage page)
{
super(page);
}
public static void main(String[] args) throws IOException
{
File file = new File("q2.pdf");
try (PDDocument doc = PDDocument.load(file))
{
PDPage page = doc.getPage(0);
PageVerticalAnalyzer engine = new PageVerticalAnalyzer(page);
engine.run();
System.out.println(engine.verticalFlips);
}
}
/**
* Runs the engine on the current page.
*
* @throws IOException If there is an IO error while drawing the page.
*/
public void run() throws IOException
{
processPage(getPage());
for (PDAnnotation annotation : getPage().getAnnotations())
{
showAnnotation(annotation);
}
}
// All path related stuff
@Override
public void clip(int windingRule) throws IOException
{
System.out.println("clip");
}
@Override
public void moveTo(float x, float y) throws IOException
{
System.out.printf("moveTo %.2f %.2f%n", x, y);
lastPathBottomTop = new float[] {(Float) null, y};
}
@Override
public void lineTo(float x, float y) throws IOException
{
System.out.printf("lineTo %.2f %.2f%n", x, y);
lastLineTo = new float[] {x, y};
}
@Override
public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException
{
System.out.printf("curveTo %.2f %.2f, %.2f %.2f, %.2f %.2f%n", x1, y1, x2, y2, x3, y3);
}
@Override
public Point2D getCurrentPoint() throws IOException
{
// if you want to build paths, you'll need to keep track of this like PageDrawer does
return new Point2D.Float(0, 0);
}
@Override
public void closePath() throws IOException
{
System.out.println("closePath");
lastPathBottomTop[0] = lastLineTo[1];
lastLineTo = null;
}
@Override
public void endPath() throws IOException
{
System.out.println("endPath");
}
@Override
public void strokePath() throws IOException
{
System.out.println("strokePath");
}
@Override
public void fillPath(int windingRule) throws IOException
{
System.out.println("fillPath");
}
@Override
public void fillAndStrokePath(int windingRule) throws IOException
{
System.out.println("fillAndStrokePath");
}
@Override
public void shadingFill(COSName shadingName) throws IOException
{
System.out.println("shadingFill " + shadingName.toString());
}
// Rectangle related stuff
@Override
public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException
{
System.out.printf("appendRectangle %.2f %.2f, %.2f %.2f, %.2f %.2f, %.2f %.2f%n",
p0.getX(), p0.getY(), p1.getX(), p1.getY(),
p2.getX(), p2.getY(), p3.getX(), p3.getY());
lastRectBottomTop = new float[] {(float) p0.getY(), (float) p3.getY()};
}
// Image drawing
@Override
public void drawImage(PDImage pdImage) throws IOException
{
System.out.println("drawImage");
if (lastPathBottomTop != null) {
addVerticalUseSection(lastPathBottomTop[0], lastPathBottomTop[1]);
} else if (lastRectBottomTop != null ){
addVerticalUseSection(lastRectBottomTop[0], lastRectBottomTop[1]);
} else {
throw new Error("Drawing image without last reference!");
}
lastPathBottomTop = null;
lastRectBottomTop = null;
}
// All text related stuff
@Override
public void showTextString(byte[] string) throws IOException
{
System.out.print("showTextString \"");
super.showTextString(string);
System.out.println("\"");
}
@Override
public void showTextStrings(COSArray array) throws IOException
{
System.out.print("showTextStrings \"");
super.showTextStrings(array);
System.out.println("\"");
}
@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode,
Vector displacement) throws IOException
{
// print the actual character that is being rendered
System.out.print(unicode);
super.showGlyph(textRenderingMatrix, font, code, unicode, displacement);
// rendering matrix seems to contain bounding box of dimensions the char
// and an x/y point where bounding box starts
//System.out.println(textRenderingMatrix.toString());
// y of the bottom of the char
// not sure why the y value is in the 8th column
// when I print the matrix, it shows up in the 6th column
float yBottom = textRenderingMatrix.getValue(0, 7);
// height of the char
// using the value in the first column as the char height
float yTop = yBottom + textRenderingMatrix.getValue(0, 0);
addVerticalUseSection(yBottom, yTop);
}
// Keeping track of bottom/top point pairs
void addVerticalUseSection(float from, float to)
{
if (to < from)
{
float temp = to;
to = from;
from = temp;
}
int i=0, j=0;
for (; i<verticalFlips.size(); i++)
{
float flip = verticalFlips.get(i);
if (flip < from)
continue;
for (j=i; j<verticalFlips.size(); j++)
{
flip = verticalFlips.get(j);
if (flip < to)
continue;
break;
}
break;
}
boolean fromOutsideInterval = i%2==0;
boolean toOutsideInterval = j%2==0;
while (j-- > i)
verticalFlips.remove(j);
if (toOutsideInterval)
verticalFlips.add(i, to);
if (fromOutsideInterval)
verticalFlips.add(i, from);
}
final List<Float> verticalFlips = new ArrayList<Float>();
private float[] lastRectBottomTop;
private float[] lastPathBottomTop;
private float[] lastLineTo;
}
我正在寻找以下问题的答案:
- 如何改进此实现?
- 曲线等其他我没有处理过的怎么处理?
此答案与原始 iText 版本存在相同的问题。
PageVerticalAnalyzer
的端口
可以按如下方式将 PageVerticalAnalyzer
从 iText 移植到 PDFBox:
public class PageVerticalAnalyzer extends PDFGraphicsStreamEngine {
protected PageVerticalAnalyzer(PDPage page) {
super(page);
}
public List<Float> getVerticalFlips() {
return verticalFlips;
}
//
// Text
//
@Override
protected void showGlyph(Matrix textRenderingMatrix, PDFont font, int code, String unicode, Vector displacement)
throws IOException {
super.showGlyph(textRenderingMatrix, font, code, unicode, displacement);
Shape shape = calculateGlyphBounds(textRenderingMatrix, font, code);
if (shape != null) {
Rectangle2D rect = shape.getBounds2D();
addVerticalUseSection(rect.getMinY(), rect.getMaxY());
}
}
/**
* Copy of <code>org.apache.pdfbox.examples.util.DrawPrintTextLocations.calculateGlyphBounds(Matrix, PDFont, int)</code>.
*/
private Shape calculateGlyphBounds(Matrix textRenderingMatrix, PDFont font, int code) throws IOException
{
GeneralPath path = null;
AffineTransform at = textRenderingMatrix.createAffineTransform();
at.concatenate(font.getFontMatrix().createAffineTransform());
if (font instanceof PDType3Font)
{
// It is difficult to calculate the real individual glyph bounds for type 3 fonts
// because these are not vector fonts, the content stream could contain almost anything
// that is found in page content streams.
PDType3Font t3Font = (PDType3Font) font;
PDType3CharProc charProc = t3Font.getCharProc(code);
if (charProc != null)
{
BoundingBox fontBBox = t3Font.getBoundingBox();
PDRectangle glyphBBox = charProc.getGlyphBBox();
if (glyphBBox != null)
{
// PDFBOX-3850: glyph bbox could be larger than the font bbox
glyphBBox.setLowerLeftX(Math.max(fontBBox.getLowerLeftX(), glyphBBox.getLowerLeftX()));
glyphBBox.setLowerLeftY(Math.max(fontBBox.getLowerLeftY(), glyphBBox.getLowerLeftY()));
glyphBBox.setUpperRightX(Math.min(fontBBox.getUpperRightX(), glyphBBox.getUpperRightX()));
glyphBBox.setUpperRightY(Math.min(fontBBox.getUpperRightY(), glyphBBox.getUpperRightY()));
path = glyphBBox.toGeneralPath();
}
}
}
else if (font instanceof PDVectorFont)
{
PDVectorFont vectorFont = (PDVectorFont) font;
path = vectorFont.getPath(code);
if (font instanceof PDTrueTypeFont)
{
PDTrueTypeFont ttFont = (PDTrueTypeFont) font;
int unitsPerEm = ttFont.getTrueTypeFont().getHeader().getUnitsPerEm();
at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
if (font instanceof PDType0Font)
{
PDType0Font t0font = (PDType0Font) font;
if (t0font.getDescendantFont() instanceof PDCIDFontType2)
{
int unitsPerEm = ((PDCIDFontType2) t0font.getDescendantFont()).getTrueTypeFont().getHeader().getUnitsPerEm();
at.scale(1000d / unitsPerEm, 1000d / unitsPerEm);
}
}
}
else if (font instanceof PDSimpleFont)
{
PDSimpleFont simpleFont = (PDSimpleFont) font;
// these two lines do not always work, e.g. for the TT fonts in file 032431.pdf
// which is why PDVectorFont is tried first.
String name = simpleFont.getEncoding().getName(code);
path = simpleFont.getPath(name);
}
else
{
// shouldn't happen, please open issue in JIRA
System.out.println("Unknown font class: " + font.getClass());
}
if (path == null)
{
return null;
}
return at.createTransformedShape(path.getBounds2D());
}
//
// Bitmaps
//
@Override
public void drawImage(PDImage pdImage) throws IOException {
Matrix ctm = getGraphicsState().getCurrentTransformationMatrix();
Section section = null;
for (int x = 0; x < 2; x++) {
for (int y = 0; y < 2; y++) {
Point2D.Float point = ctm.transformPoint(x, y);
if (section == null)
section = new Section(point.y);
else
section.extendTo(point.y);
}
}
addVerticalUseSection(section.from, section.to);
}
//
// Paths
//
@Override
public void appendRectangle(Point2D p0, Point2D p1, Point2D p2, Point2D p3) throws IOException {
subPath = null;
Section section = new Section(p0.getY());
section.extendTo(p1.getY()).extendTo(p2.getY()).extendTo(p3.getY());
currentPoint = p0;
}
@Override
public void clip(int windingRule) throws IOException {
}
@Override
public void moveTo(float x, float y) throws IOException {
subPath = new Section(y);
path.add(subPath);
currentPoint = new Point2D.Float(x, y);
}
@Override
public void lineTo(float x, float y) throws IOException {
if (subPath == null) {
subPath = new Section(y);
path.add(subPath);
} else
subPath.extendTo(y);
currentPoint = new Point2D.Float(x, y);
}
/**
* Beware! This is incorrect! The control points may be outside
* the vertically used range
*/
@Override
public void curveTo(float x1, float y1, float x2, float y2, float x3, float y3) throws IOException {
if (subPath == null) {
subPath = new Section(y1);
path.add(subPath);
} else
subPath.extendTo(y1);
subPath.extendTo(y2).extendTo(y3);
currentPoint = new Point2D.Float(x3, y3);
}
@Override
public Point2D getCurrentPoint() throws IOException {
return currentPoint;
}
@Override
public void closePath() throws IOException {
}
@Override
public void endPath() throws IOException {
path.clear();
subPath = null;
}
@Override
public void strokePath() throws IOException {
for (Section section : path) {
addVerticalUseSection(section.from, section.to);
}
path.clear();
subPath = null;
}
@Override
public void fillPath(int windingRule) throws IOException {
for (Section section : path) {
addVerticalUseSection(section.from, section.to);
}
path.clear();
subPath = null;
}
@Override
public void fillAndStrokePath(int windingRule) throws IOException {
for (Section section : path) {
addVerticalUseSection(section.from, section.to);
}
path.clear();
subPath = null;
}
@Override
public void shadingFill(COSName shadingName) throws IOException {
// TODO Auto-generated method stub
}
Point2D currentPoint = null;
List<Section> path = new ArrayList<Section>();
Section subPath = null;
static class Section {
Section(double value) {
this((float)value);
}
Section(float value) {
from = value;
to = value;
}
Section extendTo(double value) {
return extendTo((float)value);
}
Section extendTo(float value) {
if (value < from)
from = value;
else if (value > to)
to = value;
return this;
}
private float from;
private float to;
}
void addVerticalUseSection(double from, double to) {
addVerticalUseSection((float)from, (float)to);
}
void addVerticalUseSection(float from, float to) {
if (to < from) {
float temp = to;
to = from;
from = temp;
}
int i=0, j=0;
for (; i<verticalFlips.size(); i++) {
float flip = verticalFlips.get(i);
if (flip < from)
continue;
for (j=i; j<verticalFlips.size(); j++) {
flip = verticalFlips.get(j);
if (flip < to)
continue;
break;
}
break;
}
boolean fromOutsideInterval = i%2==0;
boolean toOutsideInterval = j%2==0;
while (j-- > i)
verticalFlips.remove(j);
if (toOutsideInterval)
verticalFlips.add(i, to);
if (fromOutsideInterval)
verticalFlips.add(i, from);
}
final List<Float> verticalFlips = new ArrayList<Float>();
}
实现实际上与 BoundingBoxFinder
from DrawPrintTextLocations
来确定文本轮廓。
此外,与原始iText5PageVerticalAnalyzer
from curveTo
处理中存在问题,控制点被视为在实际曲线上,但实际上它们通常是不能并且可以远远超出曲线的垂直使用范围。可以使用相应的 AWT classes 代替此处实现的路径处理,但这在 Android 等
就像那里一样class忽略了注释,但是iText5密集合并也忽略了注释。而这个class也忽略了剪辑路径...
PdfVeryDenseMergeTool
的端口
public class PdfVeryDenseMergeTool {
public PdfVeryDenseMergeTool(PDRectangle size, float top, float bottom, float gap)
{
this.pageSize = size;
this.topMargin = top;
this.bottomMargin = bottom;
this.gap = gap;
}
public void merge(OutputStream outputStream, Iterable<PDDocument> inputs) throws IOException
{
try
{
openDocument();
for (PDDocument input: inputs)
{
merge(input);
}
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
document.save(outputStream);
}
finally
{
closeDocument();
}
}
void openDocument() throws IOException
{
document = new PDDocument();
newPage();
}
void closeDocument() throws IOException
{
try
{
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
document.close();
}
finally
{
this.document = null;
this.yPosition = 0;
}
}
void newPage() throws IOException
{
if (currentContents != null) {
currentContents.close();
currentContents = null;
}
currentPage = new PDPage(pageSize);
document.addPage(currentPage);
yPosition = pageSize.getUpperRightY() - topMargin;
currentContents = new PDPageContentStream(document, currentPage);
}
void merge(PDDocument input) throws IOException
{
for (PDPage page : input.getPages())
{
merge(input, page);
}
}
void merge(PDDocument sourceDoc, PDPage page) throws IOException
{
PDRectangle pageSizeToImport = page.getCropBox();
PageVerticalAnalyzer analyzer = new PageVerticalAnalyzer(page);
analyzer.processPage(page);
List<Float> verticalFlips = analyzer.getVerticalFlips();
if (verticalFlips.size() < 2)
return;
LayerUtility layerUtility = new LayerUtility(document);
PDFormXObject form = layerUtility.importPageAsForm(sourceDoc, page);
int startFlip = verticalFlips.size() - 1;
boolean first = true;
while (startFlip > 0)
{
if (!first)
newPage();
float freeSpace = yPosition - pageSize.getLowerLeftY() - bottomMargin;
int endFlip = startFlip + 1;
while ((endFlip > 1) && (verticalFlips.get(startFlip) - verticalFlips.get(endFlip - 2) < freeSpace))
endFlip -=2;
if (endFlip < startFlip)
{
float height = verticalFlips.get(startFlip) - verticalFlips.get(endFlip);
currentContents.saveGraphicsState();
currentContents.addRect(0, yPosition - height, pageSizeToImport.getWidth(), height);
currentContents.clip();
Matrix matrix = Matrix.getTranslateInstance(0, (float)(yPosition - (verticalFlips.get(startFlip) - pageSizeToImport.getLowerLeftY())));
currentContents.transform(matrix);
currentContents.drawForm(form);
currentContents.restoreGraphicsState();
yPosition -= height + gap;
startFlip = endFlip - 1;
}
else if (!first)
throw new IllegalArgumentException(String.format("Page %s content sections too large.", page));
first = false;
}
}
PDDocument document = null;
PDPage currentPage = null;
PDPageContentStream currentContents = null;
float yPosition = 0;
final PDRectangle pageSize;
final float topMargin;
final float bottomMargin;
final float gap;
}
这本质上是 iText 5 的一个简单端口 PdfVeryDenseMergeTool
,没有什么特别之处。
PdfVeryDenseMergeTool
的用法
只需创建一个带有格式信息的 PdfVeryDenseMergeTool
实例,然后使用 PDDocument
个实例作为源开始合并:
PDDocument document1 = ...;
...
PDDocument documentN = ...;
PdfVeryDenseMergeTool tool = new PdfVeryDenseMergeTool(PDRectangle.A4, 30, 30, 10);
tool.merge(new FileOutputStream(RESULT_FILE), Arrays.asList(document1, ..., documentN));
(DenseMerging 测试 testVeryDenseMerging
)