就地更新PDF图像

Update PDF image in-place

我正在尝试使用 PDFNet 7.0.4netcoreapp3.1 替换 SDF 文档中的图像流。我想尽可能地维护原始对象及其元数据;相同的尺寸、颜色系统、压缩等。理想情况下,对象数量甚至生成也将保持不变 - 目标是前后比较将仅显示流中发生变化的像素。

我正在使用此方法将原始像素数据作为 Stream 对象获取:

private Stream GetImageData(int objectNum)
{
    var image = new PDF.Image(sdfDoc.GetObj(objectNum));

    var bits = image.GetBitsPerComponent();
    var channels = image.GetComponentNum();
    var bytesPerChannel = bits / 8;
    var height = image.GetImageHeight();
    var width = image.GetImageWidth();

    var data = image.GetImageData();
    var len = height * width * channels * bytesPerChannel;

    using (var reader = new pdftron.Filters.FilterReader(data))
    {
        var buffer = new byte[len];
        reader.Read(buffer);                

        return new MemoryStream(buffer);
    }
}

处理图像数据后,我想在保存底层 SDFDoc 对象之前更新它。我试过使用以下方法:

private void SetImageData(int objectNum, Stream stream)
{
    var image = new PDF.Image(sdfDoc.GetObj(objectNum));

    var bits = image.GetBitsPerComponent();
    var channels = image.GetComponentNum();
    var bytesPerChannel = bits / 8;
    var height = image.GetImageHeight();
    var width = image.GetImageWidth();

    var len = height * width * channels * bytesPerChannel;
    if (stream.Length != len) { throw new DataMisalignedException("Stream length does not match expected image dimensions"); }

    using (var ms = new MemoryStream())
    using (var writer = new pdftron.Filters.FilterWriter(image.GetImageData()))
    {
        stream.CopyTo(ms);
        writer.WriteBuffer(ms.ToArray());
    }
}

这运行没有错误,但实际上似乎没有任何更新。我试过使用 SDFObj.SetStreamData(),但也没能成功。直接替换图像流中的原始像素数据的影响最小、性能最高的方法是什么?


编辑

我用这个方法进行到一半:

private void SetImageData(int objectNum, Stream stream)
{
    var sdfObj = sdfDoc.GetObj(objectNum);
    var image = new PDF.Image(sdfObj);

    var bits = image.GetBitsPerComponent();
    var channels = image.GetComponentNum();
    var bytesPerChannel = bits / 8;
    var height = image.GetImageHeight();
    var width = image.GetImageWidth();

    var len = height * width * channels * bytesPerChannel;
    if (stream.Length != len) { throw new DataMisalignedException("Stream length does not match expected image dimensions"); }

    var buffer = new byte[len];
    stream.Read(buffer, 0, len);
    sdfObj.SetStreamData(buffer);
    sdfObj.Erase("Filters");
}

这按预期工作,但有一个明显的警告,即它只是忽略任何现有的压缩并将图像转换为原始未压缩流。

我试过 sdfObj.SetStreamData(buffer, image.GetImageData());sdfObj.SetStreamData(buffer, image.GetImageData().GetAttachedFilter()); 这确实会更新文件中的对象,但生成的图像无法呈现。

以下代码展示了如何保留 Image 对象,但更改实际的流数据。

static private Stream GetImageData(Obj o)
{
    var image = new pdftron.PDF.Image(o);

    var bits = image.GetBitsPerComponent();
    var channels = image.GetComponentNum();
    var bytesPerChannel = bits / 8;
    var height = image.GetImageHeight();
    var width = image.GetImageWidth();

    var data = image.GetImageData();
    var len = height * width * channels * bytesPerChannel;

    using (var reader = new pdftron.Filters.FilterReader(data))
    {
        var buffer = new byte[len];
        reader.Read(buffer);
        return new MemoryStream(buffer);
    }
}

static private void SetImageData(PDFDoc doc, Obj o, Stream stream)
{

    var image = new pdftron.PDF.Image(o);

    var bits = image.GetBitsPerComponent();
    var channels = image.GetComponentNum();
    var bytesPerChannel = bits / 8;
    var height = image.GetImageHeight();
    var width = image.GetImageWidth();

    var len = height * width * channels * bytesPerChannel;
    if (stream.Length != len) { throw new DataMisalignedException("Stream length does not match expected image dimensions"); }

    o.Erase("DecodeParms"); // Important: this won'be accurate after SetStreamData
    // now we actually do the stream swap
    o.SetStreamData((stream as MemoryStream).ToArray(), new FlateEncode(null));
}

static private void InvertPixels(Stream stream)
{
    // This function is for DEMO purposes
    // this code assumes 3 channel 8bit
    long length = stream.Length;
    long pixels = length / 3;
    for(int p = 0; p < pixels; ++p)
    {
        int c1 = stream.ReadByte();
        int c2 = stream.ReadByte();
        int c3 = stream.ReadByte();
        stream.Seek(-3, SeekOrigin.Current);
        stream.WriteByte((byte)(255 - c1));
        stream.WriteByte((byte)(255 - c2));
        stream.WriteByte((byte)(255 - c3));
    }
    stream.Seek(0, SeekOrigin.Begin);
}

下面是要使用的示例代码。

static void Main(string[] args)
{
    PDFNet.Initialize();

    var x = new PDFDoc(@"2002.04610.pdf");
    x.InitSecurityHandler();

    var o = x.GetSDFDoc().GetObj(381);
    Stream source = GetImageData(o);
    InvertPixels(source);
    SetImageData(x, o, source);
    x.Save(@"2002.04610-MOD.pdf", SDFDoc.SaveOptions.e_remove_unused);
}