如何从 PDF 中删除所有图像？

Question

我想从 PDF 文件中删除所有图像。

页面布局不应更改。所有图像都应替换为空 space.

如何借助 Ghostscript 和适当的 PostScript 代码实现这一点？

Answer 1

我自己提出了答案，但实际代码是由 Ghostscript 开发人员 Chris Liddell 提供的。

我使用了他的原始 PostScript 代码并剥离了它的其他功能。仅保留删除 光栅图像 的功能。其他图形页面对象——文本部分、图案和矢量对象——应保持不变。

复制以下代码并保存为remove-images.ps:

%!PS

% Run as:
%
%      gs ..... -dFILTERIMAGE -dDELAYBIND -dWRITESYSTEMDICT \
%                 ..... remove-images.ps <your-input-file>
%
% derived from Chris Liddell's original 'filter-obs.ps' script
% Adapted by @pdfkungfoo (on Twitter)

currentglobal true setglobal

32 dict begin

/debugprint     { systemdict /DUMPDEBUG .knownget { {print flush} if} 
                {pop} ifelse } bind def

/pushnulldevice {
  systemdict exch .knownget not
  {
    //false
  } if

  {
    gsave
    matrix currentmatrix
    nulldevice
    setmatrix
  } if
} bind def

/popnulldevice {
  systemdict exch .knownget not
  {
    //false
  } if
  {
    % this is hacky - some operators clear the current point
    % i.e.
    { currentpoint } stopped
    { grestore }
    { grestore moveto} ifelse
  } if
} bind def

/sgd {systemdict exch get def} bind def

systemdict begin

/_image /image sgd
/_imagemask /imagemask sgd
/_colorimage /colorimage sgd

/image {
   (\nIMAGE\n) //debugprint exec /FILTERIMAGE //pushnulldevice exec
  _image
  /FILTERIMAGE //popnulldevice exec
} bind def

/imagemask
{
  (\nIMAGEMASK\n) //debugprint exec
  /FILTERIMAGE //pushnulldevice exec
  _imagemask
  /FILTERIMAGE //popnulldevice exec
} bind def

/colorimage
{
  (\nCOLORIMAGE\n) //debugprint exec
  /FILTERIMAGE //pushnulldevice exec
  _colorimage
  /FILTERIMAGE //popnulldevice exec
} bind def

end
end

.bindnow

setglobal

现在运行这个命令：

gs -o no-more-images-in-sample.pdf \
   -sDEVICE=pdfwrite               \
   -dFILTERIMAGE                   \
   -dDELAYBIND                     \
   -dWRITESYSTEMDICT               \
    remove-images.ps               \
    sample.pdf

我用官方 PDF 规范测试了代码，它有效。以下两个屏幕截图显示了输入和输出 PDF 的第 750 页：

如果您想知道为什么看起来像图像的东西仍然在输出页面上：它不是真正的光栅图像，而是原始文件中的 'pattern'，因此不会被删除。

Answer 2

与此同时，最新的 Ghostscript 版本提供了一种更好用且更易于使用的方法来从 PDF 中删除所有图像。添加到命令行的参数是-dFILTERIMAGE

 gs -o noimages.pdf -sDEVICE=pdfwrite -dFILTERIMAGE input.pdf

更好的是，您还可以通过指定 -dFILTERTEXT 或 -dFILTERVECTOR.

从 PDF 中删除所有文本或所有矢量绘图元素

当然，您也可以将这些-dFILTER*参数任意组合，以达到所需的结果。（将所有三个结合起来当然会产生 "empty" 页。）

以下是示例 PDF 页面的屏幕截图，其中包含上述所有 3 种类型的内容：

_{包含 "image"、"vector" 和 "text" 元素的原始 PDF 页面的屏幕截图。}

运行以下 6 个命令将创建剩余内容的所有 6 种可能变体：

 gs -o noIMG.pdf   -sDEVICE=pdfwrite -dFILTERIMAGE                input.pdf
 gs -o noTXT.pdf   -sDEVICE=pdfwrite -dFILTERTEXT                 input.pdf
 gs -o noVCT.pdf   -sDEVICE=pdfwrite -dFILTERVECTOR               input.pdf

 gs -o onlyTXT.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERIMAGE input.pdf 
 gs -o onlyIMG.pdf -sDEVICE=pdfwrite -dFILTERVECTOR -dFILTERTEXT  input.pdf
 gs -o onlyVCT.pdf -sDEVICE=pdfwrite -dFILTERIMAGE  -dFILTERTEXT  input.pdf

下图说明了结果：

_{顶行，左起：全部"text"已删除；全部 "images" 已删除；所有 "vectors" 已删除。底行，从左起：仅保留"text"；只保留 "images"；只保留 "vectors"。}

如何从 PDF 中删除所有图像？

How can I remove all images from a PDF?

pdf

postscript

ghostscript