使用 Magick.NET 将 PDF 转换为 PNG 时缺少文本

Text missing when converting a PDF to PNG using Magick.NET

我有一个 MVC 应用程序,它正在上传 PDF 文件并使用 Magick.NET 将每个页面呈现为单个 PNG 图像。在大多数情况下,转换都很好,但在某些情况下,我得到一个空白图像,其中文本应该是文本,而其他文本行在同一图像中正确显示。有谁知道是什么原因造成的?

下面是我正在使用的代码。

public FileResult PNGPreview(Guid id, Int32 index)
{
    MagickReadSettings settings = new MagickReadSettings();
    // Settings the density to 300 dpi will create an image with a better quality
    settings.FrameIndex = index;
    settings.FrameCount = 1;
    settings.Density = new PointD(300, 300);
    settings.UseMonochrome = true;
    using (MagickImageCollection images = new MagickImageCollection())
    {
        // Add all the pages of the pdf file to the collection
        images.Read(CreateDocument(id), settings);

        using (MemoryStream stream = new MemoryStream())
        {

            images[0].Write(stream, MagickFormat.Png24);
            stream.Close();
            byte[] result = stream.ToArray();
            return File(result, "image/png");
        }
    }
}

private byte[] CreateDocument(Guid id)
{
    PdfReader reader = new PdfReader(Server.MapPath(String.Format("~/documenttemplates/{0}.pdf", id)));
    byte[] result = null;
    using (MemoryStream ms = new MemoryStream())
    {
        PdfStamper stamper = new PdfStamper(reader, ms, '[=11=]', false);
        stamper.Close();
        reader.Close();
        result = ms.ToArray();
    }

    return result;
}

导致这个问题的 PDF 文件是通过电子邮件提供给我的,我被告知这个文件是用 Word 创建的,然后用 Foxit Pro 编辑的。

Magick.NET 使用 Ghostscript 将 PDF 文件转换为图像。执行类似于下面的命令。

"c:\Program Files (x86)\gs\gs9.16\bin\gswin32c.exe" -q -dQUIET -dSAFER -dBATCH -dNOPAUSE
-dNOPROMPT -dMaxBitmap=500000000 -dAlignToPixels=0 -dGridFitTT=2 -sDEVICE=pnggray"
-dTextAlphaBits=4 -dGraphicsAlphaBits=4 "-r72x72"  "-sOutputFile=Test.%d.png" "-fTest.pdf"

这会告诉我们创建的文件已损坏。

**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.
**** Error reading a content stream. The page may be incomplete.
**** File did not complete the page properly and may be damaged.

**** This file had errors that were repaired or ignored.
**** The file was produced by:
**** >>>> Microsoft? Word 2013 <<<<
**** Please notify the author of the software that produced this
**** file that it does not conform to Adobe's published PDF
**** specification.

这可以通过使用不同的程序创建输入文件来解决。