Python Uudecode 调用损坏

Question

我正在努力从 SEC 文件中提取 PDF。他们通常是这样来的：

SEC Filing Example

无论出于何种原因，当我将原始 PDF 保存为 .text 文件，然后尝试运行

uudecode -o output_file.pdf input_file.txt

从 python subprocess.call() 函数或任何其他允许从命令行执行命令的 python 函数，生成的 PDF 文件已损坏。如果我直接从命令行运行使用相同的命令，则不会损坏。

仔细查看从 python 脚本输出的 PDF 文件时，文件似乎过早结束。从 python 执行命令行命令时是否存在某种输出限制？

谢谢！

Answer 1

在 Fedora 21 x86_64 和 uudecode 4.15.2:Python 3.4.1 下，这个脚本对我来说工作正常运行:

import subprocess
subprocess.call("uudecode -o output_file.pdf input_file.txt", shell=True)

使用链接的 SEC 文件（长度：173,141 B；sha1：e4f7fa2cbb3422411c2f2968d954d6bb9808b884），解码后的 PDF（长度：124,557 B；sha1：1676320e1d9923e14d19451c16688198bc93ca0d）在查看时显示正确。

您的环境中可能有其他原因导致了该问题。您可能想为您的问题添加更多详细信息。

Is there some sort of output limit when executing a command line command from python?

如果 "output limit" 是指 uudecode 正在写入的文件的大小，则否。使用 subprocess 模块时，您唯一需要担心的 "output limit" 类型是在创建子进程时传递 stdout=PIPE 或 stderr=PIPE 时。如果子进程向这些流中的任何一个写入足够的数据，并且您的脚本没有定期耗尽它们，则子进程将阻塞（请参阅 subprocess 模块文档）。在我的测试中，uudecode 没有向 stdout 或 stderr 写入任何内容。

Python Uudecode 调用损坏

Python Uudecode Call Corruption

python

pdf

subprocess

corrupt

uudecode