Python 对 xpdf 的 pdftotext 的子进程调用不适用于编码

Question

我正在尝试运行 pdftotext 使用 python subprocess 模块。

import subprocess

pdf = r"path\to\file.pdf"
txt = r"path\to\out.txt"
pdftotext = r"path\to\pdftotext.exe"

cmd = [pdftotext, pdf, txt, '-enc UTF-8']
response = subprocess.check_output(cmd, 
                shell=True,
                stderr=subprocess.STDOUT)

TB

CalledProcessError: Command '['path\to\pdftotext.exe',
'path\to\file.pdf', 'path\to\out.txt', '-enc UTF-8']'
returned non-zero exit status 99

当我从 cmd 中删除最后一个参数“-enc UTF-8”时，它在 python 中工作正常。

当我在cmd运行pdftotext pdf txt -enc UTF-8时，它工作正常。

我错过了什么？

谢谢。

Answer 1

subprocess 有一些复杂的命令处理规则。来自 docs:

The shell argument (which defaults to False) specifies whether to use the shell as the program to execute. If shell is True, it is recommended to pass args as a string rather than as a sequence.

此答案中解释了更多详细信息 here。

因此，如文档所述，您应该将命令转换为字符串：

cmd = r"""{} "{}" "{}" -enc UTF-8""".format('pdftotext', pdf, txt)

现在，调用 subprocess 为：

subprocess.call(cmd, shell=True, stderr=subprocess.STDOUT)

Python 对 xpdf 的 pdftotext 的子进程调用不适用于编码

Python subprocess call to xpdf's pdftotext not working with encoding

python

subprocess

pdftotext

python-3.x