使用 subprocess 模块与 | grep特点

Question

假设 file.txt 包含以下内容：

line one
line two
line three

然后，这些对 subprocess.check_output 的调用失败（python 2.7.5 表示 grep 失败，退出代码为 1，在 python 3.8.5 中它挂起并需要键盘中断停止程序):

# first approach
command = 'grep "one\|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)

# second approach
command = 'grep -E "one|three" ./file.txt'
results = subprocess.check_output(command.split())
print(results)

但是这个调用成功了（在两个版本上）并给出了预期的输出：

#third approach
command = 'grep -e one -e three ./file.txt'
results = subprocess.check_output(command.split())
print(results)

为什么会这样？关于为什么方法一和方法二不起作用的唯一猜测是 subprocess 模块和 | 字符如何工作之间的一些复杂性，但老实说我不知道为什么这会导致调用失败;在第一种方法中，字符被转义，而在第二种方法中，我们有一个标志被传递给 grep 说我们不应该转义字符。此外，如果您像往常一样在命令行中输入方法 1 和 2，它们将按预期工作。会不会是 subprocess 模块将字符解释为管道而不是正则表达式 OR？

Answer 1

command.split() 的结果包含不应再存在的引号。这就是 Python 提供 shlex.split 的原因，但也不难理解如何手动拆分命令，尽管显然你需要了解 shell 中引号的作用，以及基本上如何当没有 shell.

时，你需要删除它们

command = 'grep "one\|three" ./file.txt'
results1 = subprocess.check_output(['grep', r'one\|three', './file.txt'])
results2 = subprocess.check_output(shlex.split(command))
results3 = subprocess.check_output(command, shell=True) # better avoid

引号告诉 shell 不要对值执行空白标记化 and/or 通配符扩展，但是当没有 shell 时，您应该简单地提供一个字符串而不是 shell 允许甚至要求您使用引号字符串。

使用 subprocess 模块与 | grep特点

Using the subprocess module to grep with the | character

python

grep

subprocess