如何在子流程中混合使用猫和管道

Question

我正在尝试 cat 一个文件的内容并将其通过管道传输到第二个 python 脚本的标准输入中，然后将其标准输出放入另一个文件中。

命令行看起来像这样：

cat input_file | python3 ~/Desktop/python_script.py > output_file

看了很多帖子后，我试过这样做

file_input = subprocess.Popen(('cat', input_file), stdout=subprocess.PIPE)
file_output = subprocess.check_output(('python3', '~/Desktop/mdparser.py'), stdin=file_input.stdout, stdout=subprocess.PIPE)
subprocess.check_output('>','output_file',stdin = file_output.stdout)

但是第二行出现以下错误：

File "/usr/local/Cellar/python3/3.4.1/Frameworks/Python.framework/Versions/3.4/lib/python3.4/subprocess.py", line 598, in check_output
    raise ValueError('stdout argument not allowed, it will be overridden.')
ValueError: stdout argument not allowed, it will be overridden.

Answer 1

这应该只有一个调用，而不是三个。

exit_status = subprocess.call(
  ['python3', os.path.expanduser('~/Desktop/mdparser.py')],
  stdin=open('input_file', 'r'), stdout=open('output_file', 'w'))

波浪号扩展 (~/foo) 由 shell 处理；当你没有 shell 时，就像这里一样，你需要自己明确地做——这就是 os.path.expanduser 所做的。

当 stdout 被重定向时，您不能使用 check_output()，无论是重定向到不同的进程还是文件——这就是抛出异常的原因，因为 Python 解释器不能既将内容读入变量又将其直接连接到不同进程的管道中。这就是消息关于 "will be overridden" 的含义——当您使用 check_output() 时，您是在告诉 Python 解释器从管道本身读取输出，但是当您使用配置该输出以转到不同的进程或文件。

相反，直接将输出定向到文件，完成后打开文件并阅读。

不使用 cat 的另一个原因是它所做的只是增加效率和限制操作。当你运行:

foo <input.txt >output.txt

...或者，如果您更喜欢这种形式...

<input.txt foo >output.txt

...foo 程序直接在 input.txt 上获取一个文件句柄，另一个直接在 output.txt 上获取。当您不使用 cat 时，这些文件句柄才是真正的交易——可以在文件中四处寻找，这意味着如果您的程序必须返回并查看之前的内容，它可以告诉文件句柄返回并寻找不同的部分。相比之下，如果您运行 cat input.txt | foo，那么 foo 如果执行的操作需要多次通过，则必须将其读取的所有内容存储在内存中。

在这里使用 cat 只是开销——它是一个额外的程序，它从输入文件中读取并写入它的一半管道，毕竟，这意味着它正在对管道进行额外的 IO 操作和上下文切换到内核和从内核切换。除非你需要，否则不要使用它——比如如果你连接多个文件到一个流中（这是cat的目的，因此它的名字） .

如何在子流程中混合使用猫和管道

How to use a mixture of cats and pipes in subprocess

subprocess

pipe

cat

python-3.x