将 Bash 管道输出为 Python 兼容格式

Question

我正在使用 UDPipe 模型进行文本标记化和词形还原。我可以通过使用 !echo 命令或打印到文件中来完成任务本身，但我想生成一个 Python 数据结构来进一步处理输出。

什么有效

这是我的工作命令：

!echo 'the text I'm processing' | ./udpipe --tokenize --tag './path/to/my/model'

输出：

Loading UDPipe model: done.
newdoc
newpar
sent_id = 1
text = прывітанне, сусвет
1   прывітанне  прывітанне  NOUN    NN  Animacy=Inan|Case=Nom|Gender=Neut|Number=Sing   _   _   _   SpaceAfter=No
2   ,   ,   PUNCT   PUNCT   _   _   _   _   _
3   сусвет  сусвет  NOUN    NN  Animacy=Inan|Case=Nom|Gender=Masc|Number=Sing   _   _   _   SpacesAfter=\n

这适用于将输出打印到文件中：

!echo 'the text I'm processing' | ./udpipe --tokenize --tag './path/to/my/model' >> filename.txt

./udpipe 是包的克隆仓库

我试过的（没有成功）

os.system()

import os
text = 'the text I'm processing'
cmd = "echo '{}' | ./udpipe --tokenize --tag './path/to/my/model'".format(text)
os.system(cmd)

Out: 0

subprocess.getoutput()

import subprocess
cmd = "'the text I'm processing' | ./udpipe --tokenize --tag './path/to/my/model'"
output = subprocess.getoutput(cmd, stdout=subprocess.PIPE, shell=True)
print(output)

TypeError: getoutput() got an unexpected keyword argument 'stdout'

Answer 1

您进行了一些研究并找到了 subprocess 模块，这是从 Python 调用进程的最常见方式。如果您想使用 shell （例如管道） 的功能，您需要将参数 shell=True 传递给实际调用进程的任何函数，例如subprocess.Popen()，基础款

from subprocess import Popen, PIPE

text = "the text I'm processing"
cmd = "echo", text, "|", "./udpipe", "--tokenize", "--tag", "./path/to/my/model"
proc = Popen(cmd, stdout=PIPE, stderr=PIPE, text=True, shell=True)
output, _ = proc.communicate()
print(output)

在您的示例中，您还使用了 >> 将输出附加到文件，因此不会产生任何输出，您可以等待进程结束：

from subprocess import Popen

text = "the text I'm processing"
cmd = "echo", text, "|", "./udpipe", "--tokenize", "--tag", "./path/to/my/model", ">>", "filename.txt"
proc = Popen(cmd, shell=True)
proc.wait()

或者你可以应用更高级别的函数subprocess.call():

from subprocess import call

text = "the text I'm processing"
cmd = "echo", text, "|", "./udpipe", "--tokenize", "--tag", "./path/to/my/model", ">>", "filename.txt"
call(cmd, shell=True)

如果你想在代码中得到进程输出，你可以使用另一个更高级的函数subprocess.check_output():

from subprocess import check_output

text = "the text I'm processing"
cmd = "echo", text, "|", "./udpipe", "--tokenize", "--tag", "./path/to/my/model"
output = check_output(cmd, text=True, shell=True)
print(output)

但是！ 您也可以改用 python 功能。例如，使用 Popen() 您可以将输入传递给进程，并且 （如果需要） 将其直接重定向到文件：

from subprocess import Popen, PIPE

text = "the text I'm processing"
cmd = "./udpipe", "--tokenize", "--tag", "./path/to/my/model"
proc = Popen(cmd, stdin=PIPE, stdout=PIPE, stderr=PIPE, text=True)
output, _ = proc.communicate(input=text)
print(output)
# OR write to file directly
with open("filename.txt", "a+") as out:
    proc = Popen(cmd, stdin=PIPE, stdout=out, stderr=out, text=True)
    proc.communicate(input=text)

同higher-level check_output():

from subprocess import check_output, STDOUT

text = "the text I'm processing"
cmd = "./udpipe", "--tokenize", "--tag", "./path/to/my/model"
output = check_output(cmd, input=text, stderr=STDOUT, text=True)
print(output)

我会使用最后一个选项，但您可以应用您最喜欢的选项。

你可以帮助我的国家，检查my profile info。

将 Bash 管道输出为 Python 兼容格式

Output Bash pipes to Python-compatible format

python

bash

nlp

pipe

udpipe

什么有效

我试过的（没有成功）