通过更改字体嵌入减小使用 matplotlib 创建的 PDF 的文件大小

Question

我正在使用 matplotlib 生成 PDF 图形。然而，即使是最简单的图形也会产生相对较大的文件，下面的 MWE 会产生将近 1 MB 的文件。我已经意识到大文件大小是由于 matplotlib 完全嵌入了所有使用的字体。由于我要制作很多图并想减小文件大小，我想知道：

主要问题：

有没有办法让 matplotlib 嵌入字体子集而不是完整的字体？我也可以完全不包括字体。

目前考虑的事情：

矢量图形编辑器可以很容易地用于导出包含字体子集（以及根本不包含字体）的 PDF，但是必须为每个文件（修订版）执行此步骤显得过于乏味。
同样，我读过有关 post 处理 PDF 文件的信息（例如 using Ghostscript），尽管工作量似乎相当。
我尝试设置 'pdf.fonttype'= 3，这确实会生成相当小的文件。但是，我想让文本在矢量图形编辑器中保持可修改 - 这在这种情况下似乎不起作用（例如减号不会保存为文本）。

由于使用外部软件生成具有嵌入式子集的文件很容易，虽然劳动强度大，但是否有可能直接在 matplotlib 中实现这一点？任何帮助将不胜感激。

MWE

import matplotlib.pyplot as plt #Setup
import matplotlib as mpl
mpl.rcParams['pdf.fonttype'] = 42
mpl.rcParams['mathtext.fontset'] = 'dejavuserif'
mpl.rc('font',family='Arial',size=12)

fig,ax=plt.subplots(figsize=(2,2)) #Create a figure containing some text
ax.semilogy(1,1,'s',label='Text\n$M_\mathrm{ath}$')
ax.legend()
fig.tight_layout()
fig.savefig('test.pdf')

环境：matplotlib 3.1.1

Answer 1

将它留在这里以防其他人可能正在寻找类似的东西：毕竟，我决定选择 Ghostscript。由于额外的步骤，它并不是我想要的，但至少它可以自动化：

import subprocess
def gs_opt(filename):
    filenameTmp = filename.split('.')[-2]+'_tmp.pdf'
    gs = ['gswin64',
          '-sDEVICE=pdfwrite',
          '-dEmbedAllFonts=false',
          '-dSubsetFonts=true',             # Create font subsets (default)
          '-dPDFSETTINGS=/prepress',        # Image resolution
          '-dDetectDuplicateImages=true',   # Embeds images used multiple times only once
          '-dCompressFonts=true',           # Compress fonts in the output (default)
          '-dNOPAUSE',                      # No pause after each image
          '-dQUIET',                        # Suppress output
          '-dBATCH',                        # Automatically exit
          '-sOutputFile='+filenameTmp,      # Save to temporary output
          filename]                         # Input file

    subprocess.run(gs)                                      # Create temporary file
    subprocess.run(['del', filename],shell=True)            # Delete input file
    subprocess.run(['ren',filenameTmp,filename],shell=True) # Rename temporary to input file

然后调用

filename = 'test.pdf'
plt.savefig(filename)
gs_opt(filename)

这会将图形保存为 test.pdf，使用 Ghostscript 创建一个临时的优化文件 test_tmp.pdf，删除初始文件并将优化文件重命名为 test.pdf。

与使用矢量图形编辑器导出文件相比，Ghostscript 创建的生成的 PDF 仍然大几倍（通常为 4-5 倍）。但是，它是将文件大小减小到初始文件的 1/5 到 1/10 之间。有点东西。

Answer 2

PGF backend 有助于显着减小 PDF 文件的大小。只需将 mpl.use('pgf') 添加到您的代码中。在我的环境中，此修改导致以下结果：

文件大小从 817K 减小到 21K（小 40 倍！）。
执行时间从 1 秒增加到 3 秒。

然而，对于真实数字，执行时间通常会随着文件大小的增加而减少。

PDF 大小的减小归因于嵌入字体子集。

$ pdffonts pdf_backend.pdf
name                         type              emb sub uni prob object ID
---------------------------- ----------------- --- --- --- ---- ---------
ArialMT                      CID TrueType      yes no  yes          14  0
DejaVuSerif-Italic           CID TrueType      yes no  yes          23  0
DejaVuSerif                  CID TrueType      yes no  yes          32  0

$ pdffonts pgf_backend.pdf
name                         type              emb sub uni prob object ID
---------------------------- ----------------- --- --- --- ---- ---------
KECVVY+ArialMT               CID TrueType      yes yes yes           7  0
EFAAMX+CMR12                 Type 1C           yes yes yes           8  0
EHYQVR+CMSY8                 Type 1C           yes yes yes           9  0
UVNOSL+CMR8                  Type 1C           yes yes yes          10  0
FDPQQI+CMMI12                Type 1C           yes yes yes          11  0
DGIYWD+DejaVuSerif           CID TrueType      yes yes yes          13  0

另一种选择是生成一个 EPS 文件（使用 PostScript 后端）并将其转换为 PDF 格式，例如通过 epstopdf（使用 GhostScript 解释器）。这种方式将 PDF 文件减少到 9K。但是，值得注意的是 PS 后端不支持透明度。

通过更改字体嵌入减小使用 matplotlib 创建的 PDF 的文件大小

Reducing file sizes of PDFs created using matplotlib by changing font embedding

python

pdf

fonts

matplotlib

font-embedding