PdfSharp C# 库提取的破损图像

Question

我正在尝试执行 PDFSharp 库的 sample program。我已经在项目中提供了参考资料。

问题： 代码执行没有任何错误，我看到方法 ExportJpegImage() 被调用了 7 次（这是 pdf 中的图像数）。 但是当我尝试打开程序写入的图像时（在.jpeg），Windows无法打开它们。

static void Main(string[] args)
    {
        Console.WriteLine("Starting PDF Sharp sample program...");

        const string filename = @"D:/Test/test.pdf";

        PdfDocument document = PdfReader.Open(filename);

        int imageCount = 0;
        // Iterate pages
        foreach (PdfPage page in document.Pages)
        {
            // Get resources dictionary
            PdfDictionary resources = page.Elements.GetDictionary("/Resources");
            if (resources != null)
            {
                // Get external objects dictionary
                PdfDictionary xObjects = resources.Elements.GetDictionary("/XObject");
                if (xObjects != null)
                {
                    ICollection<pdfitem> items = xObjects.Elements.Values;
                    // Iterate references to external objects
                    foreach (PdfItem item in items)
                    {
                        PdfReference reference = item as PdfReference;
                        if (reference != null)
                        {
                            PdfDictionary xObject = reference.Value as PdfDictionary;
                            // Is external object an image?
                            if (xObject != null && xObject.Elements.GetString("/Subtype") == "/Image")
                            {
                                ExportJpegImage(xObject, ref imageCount);
                            }
                        }
                    }
                }
            }
        }

    }

 static void ExportJpegImage(PdfDictionary image, ref int count)
    {
        // Fortunately JPEG has native support in PDF and exporting an image is just writing the stream to a file.
        byte[] stream = image.Stream.Value;
        FileStream fs = new FileStream(String.Format(@"D:\Test\Image{0}.jpeg", count++), FileMode.Create, FileAccess.Write);
        BinaryWriter bw = new BinaryWriter(fs);
        bw.Write(stream);
        bw.Close();
    }

Answer 1

请阅读 example:

之前的注释

Note: This snippet shows how to export JPEG images from a PDF file. PDFsharp cannot convert PDF pages to JPEG files. This sample does not handle non-JPEG images. It does not (yet) handle JPEG images that have been flate-encoded.

There are several different formats for non-JPEG images in PDF. Those are not supported by this simple sample and require several hours of coding, but this is left as an exercise to the reader.

因此，您想要从 PDF 中提取的图像很可能根本没有嵌入为 JPEG 图像（至少没有像压缩这样的进一步过滤）。要将该数据转换为普通图像查看器可以处理的内容，您很可能需要投入 几个小时的编码。

PdfSharp C# 库提取的破损图像

Broken images extracted by PdfSharp C# Library

c#

windows

pdf

pdfsharp

visual-studio