如何使用 IText 7 从 PDF 中提取页面?
How to Extract pages from a PDF using IText 7?
我尝试使用 IText 7 库从 PDF 文件中提取页面以创建新文件。
static void Splitter() {
string file = @"C:\Users\Standard\Downloads\Merged\CK 2002989 ,514.42 02.12.20.pdf";
string range = "1, 4, 8";
var pdfDocumentInvoiceNumber = new PdfDocument(new PdfReader(file));
var split = new PdfSplitter(pdfDocumentInvoiceNumber);
var result = split.ExtractPageRange(new PageRange(range));
var numberOfPagesPdfDocumentInvoiceNumber = result.GetNumberOfPages();
String toFile = @"C:\Users\Standard\Downloads\Result\Extracted.pdf";
var pdfWriter = new PdfWriter(toFile);
var pdfDocumentInvoiceMergeResult = new PdfDocument(pdfWriter);
for (var i = 1; i <= numberOfPagesPdfDocumentInvoiceNumber; i++)
{
var pdfPage = result.GetPage(i).CopyTo(pdfDocumentInvoiceMergeResult);
pdfDocumentInvoiceMergeResult.AddPage(pdfPage);
}
但是当我尝试使用 CopyTo 时出现错误
iText.Kernel.PdfException: 'Cannot copy indirect object from the document that is being written.'
这里的问题是 return 由 PdfSplitter
方法,特别是 ExtractPageRange
编辑的文档是 写入 [=51] 的 iText 7 文档=],即这些 PdfDocument
实例已使用 PdfWriter
.
实例化
此类文档受到某些限制,特别是不能从中复制页面。有关详细信息,请阅读答案 and .
要使这些结果文档(以及整个 PdfSplitter
class 具有任何价值,因此,您需要一种方法来定义这些 PdfWriter
对象的位置文件写入。有一种方法,尽管不是一种真正直观的方法:您必须覆盖 PdfSplitter
的 GetNextPdfWriter
方法,它最初看起来像这样:
/// <summary>This method is called when another split document is to be created.</summary>
/// <remarks>
/// This method is called when another split document is to be created.
/// You can override this method and return your own
/// <see cref="iText.Kernel.Pdf.PdfWriter"/>
/// depending on your needs.
/// </remarks>
/// <param name="documentPageRange">the page range of the original document to be included in the document being created now.
/// </param>
/// <returns>the PdfWriter instance for the document which is being created.</returns>
protected internal virtual PdfWriter GetNextPdfWriter(PageRange documentPageRange) {
return new PdfWriter(new ByteArrayOutputStream());
}
在像您这样的用例中,您只希望最终将一个 return 文档写入文件,您可以这样做:
class MySplitter : PdfSplitter
{
public MySplitter(PdfDocument pdfDocument) : base(pdfDocument)
{
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange)
{
String toFile = @"C:\Users\Standard\Downloads\Result\Extracted.pdf";
return new PdfWriter(toFile);
}
}
将 PdfWriter
实例化移至该自定义拆分器后,您的主要代码将减少为
string file = @"C:\Users\Standard\Downloads\Merged\CK 2002989 ,514.42 02.12.20.pdf";
string range = "1, 4, 8";
var pdfDocumentInvoiceNumber = new PdfDocument(new PdfReader(file));
var split = new MySplitter(pdfDocumentInvoiceNumber);
var result = split.ExtractPageRange(new PageRange(range));
result.Close();
在像您这样的用例中,这确实看起来很奇怪,必须从 PdfSplitter
派生自定义 class 只是为了从源 PDF 中提取几页到结果 PDF。 ExtractPageRange
的附加 PdfWriter
参数不会使它变得更容易吗?
不过请注意,PdfSplitter
class 的主要 objective 是使用 ExtractPageRanges
和 [=29= 将文档分成许多部分] 方法,在那种情况下,您需要提供更大的、可能不完全已知的 PdfWriters
... 一点也不简单!
当然,更好的解决方案可能是注入一些 lambda 表达式或其他一些回调机制。例如:
class ImprovedSplitter : PdfSplitter
{
private Func<PageRange, PdfWriter> nextWriter;
public ImprovedSplitter(PdfDocument pdfDocument, Func<PageRange, PdfWriter> nextWriter) : base(pdfDocument)
{
this.nextWriter = nextWriter;
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange)
{
return nextWriter.Invoke(documentPageRange);
}
}
你可以这样使用
string file = @"C:\Users\Standard\Downloads\Merged\CK 2002989 ,514.42 02.12.20.pdf";
string range = "1, 4, 8";
var pdfDocumentInvoiceNumber = new PdfDocument(new PdfReader(file));
var split = new ImprovedSplitter(pdfDocumentInvoiceNumber, pageRange => new PdfWriter(@"C:\Users\Standard\Downloads\Result\Extracted.pdf"));
var result = split.ExtractPageRange(new PageRange(range));
result.Close();
我尝试使用 IText 7 库从 PDF 文件中提取页面以创建新文件。
static void Splitter() {
string file = @"C:\Users\Standard\Downloads\Merged\CK 2002989 ,514.42 02.12.20.pdf";
string range = "1, 4, 8";
var pdfDocumentInvoiceNumber = new PdfDocument(new PdfReader(file));
var split = new PdfSplitter(pdfDocumentInvoiceNumber);
var result = split.ExtractPageRange(new PageRange(range));
var numberOfPagesPdfDocumentInvoiceNumber = result.GetNumberOfPages();
String toFile = @"C:\Users\Standard\Downloads\Result\Extracted.pdf";
var pdfWriter = new PdfWriter(toFile);
var pdfDocumentInvoiceMergeResult = new PdfDocument(pdfWriter);
for (var i = 1; i <= numberOfPagesPdfDocumentInvoiceNumber; i++)
{
var pdfPage = result.GetPage(i).CopyTo(pdfDocumentInvoiceMergeResult);
pdfDocumentInvoiceMergeResult.AddPage(pdfPage);
}
但是当我尝试使用 CopyTo 时出现错误
iText.Kernel.PdfException: 'Cannot copy indirect object from the document that is being written.'
这里的问题是 return 由 PdfSplitter
方法,特别是 ExtractPageRange
编辑的文档是 写入 [=51] 的 iText 7 文档=],即这些 PdfDocument
实例已使用 PdfWriter
.
此类文档受到某些限制,特别是不能从中复制页面。有关详细信息,请阅读答案
要使这些结果文档(以及整个 PdfSplitter
class 具有任何价值,因此,您需要一种方法来定义这些 PdfWriter
对象的位置文件写入。有一种方法,尽管不是一种真正直观的方法:您必须覆盖 PdfSplitter
的 GetNextPdfWriter
方法,它最初看起来像这样:
/// <summary>This method is called when another split document is to be created.</summary>
/// <remarks>
/// This method is called when another split document is to be created.
/// You can override this method and return your own
/// <see cref="iText.Kernel.Pdf.PdfWriter"/>
/// depending on your needs.
/// </remarks>
/// <param name="documentPageRange">the page range of the original document to be included in the document being created now.
/// </param>
/// <returns>the PdfWriter instance for the document which is being created.</returns>
protected internal virtual PdfWriter GetNextPdfWriter(PageRange documentPageRange) {
return new PdfWriter(new ByteArrayOutputStream());
}
在像您这样的用例中,您只希望最终将一个 return 文档写入文件,您可以这样做:
class MySplitter : PdfSplitter
{
public MySplitter(PdfDocument pdfDocument) : base(pdfDocument)
{
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange)
{
String toFile = @"C:\Users\Standard\Downloads\Result\Extracted.pdf";
return new PdfWriter(toFile);
}
}
将 PdfWriter
实例化移至该自定义拆分器后,您的主要代码将减少为
string file = @"C:\Users\Standard\Downloads\Merged\CK 2002989 ,514.42 02.12.20.pdf";
string range = "1, 4, 8";
var pdfDocumentInvoiceNumber = new PdfDocument(new PdfReader(file));
var split = new MySplitter(pdfDocumentInvoiceNumber);
var result = split.ExtractPageRange(new PageRange(range));
result.Close();
在像您这样的用例中,这确实看起来很奇怪,必须从 PdfSplitter
派生自定义 class 只是为了从源 PDF 中提取几页到结果 PDF。 ExtractPageRange
的附加 PdfWriter
参数不会使它变得更容易吗?
不过请注意,PdfSplitter
class 的主要 objective 是使用 ExtractPageRanges
和 [=29= 将文档分成许多部分] 方法,在那种情况下,您需要提供更大的、可能不完全已知的 PdfWriters
... 一点也不简单!
当然,更好的解决方案可能是注入一些 lambda 表达式或其他一些回调机制。例如:
class ImprovedSplitter : PdfSplitter
{
private Func<PageRange, PdfWriter> nextWriter;
public ImprovedSplitter(PdfDocument pdfDocument, Func<PageRange, PdfWriter> nextWriter) : base(pdfDocument)
{
this.nextWriter = nextWriter;
}
protected override PdfWriter GetNextPdfWriter(PageRange documentPageRange)
{
return nextWriter.Invoke(documentPageRange);
}
}
你可以这样使用
string file = @"C:\Users\Standard\Downloads\Merged\CK 2002989 ,514.42 02.12.20.pdf";
string range = "1, 4, 8";
var pdfDocumentInvoiceNumber = new PdfDocument(new PdfReader(file));
var split = new ImprovedSplitter(pdfDocumentInvoiceNumber, pageRange => new PdfWriter(@"C:\Users\Standard\Downloads\Result\Extracted.pdf"));
var result = split.ExtractPageRange(new PageRange(range));
result.Close();