python 子进程 popen:管道 stdout 弄乱了字符串

python subprocess popen: Piping stdout messes up the strings

我正在尝试将几个文件连接在一起并添加 header。

import subprocess
outpath = "output.tab"
with open( outpath, "w" ) as outf :
        "write a header"
        if header is True:
            p1 = subprocess.Popen(["head", "-n1", files[-1] ], stdout= outf, )
        if type(header) is str:
            p1 = subprocess.Popen(["head", "-n1", header ], stdout= outf,)
        for fl in files:
            print(  fl )
            p1 = subprocess.Popen(["tail", "-n+2", fl], stdout= outf, )

出于某种原因,某些文件 (fl) 仅部分打印,下一个文件从前一个文件的字符串中间开始:

 awk '{print NF}' output.tab | uniq -c
    108 11
      1 14
     69 11
      1 10
     35 11
      1 16
    250 11
      1 16

在Python中有什么方法可以修复它吗?


乱码示例:

$tail -n+108 output.tab | head -n1

CENPA   chr2    27008881.0  2701ABCD3   chr1    94883932.0  94944260.0  0.0316227766017 0.260698861451  0.277741584016  0.302602378581  0.4352790705329718  56  16


$grep -n A1 'CENPA' file1.tab

109:CENPA   chr2    27008881.0  27017455.0  1.0 0.417081004817  0.0829327365256 0.545205239241  0.7196619496326693  95  3
110-CENPO   chr2    25016174.0  25045245.0  1000.0  0.151090930896  -0.0083671250883    0.50882773122   0.0876177652747541  82  0


$grep -n 'ABCD3' file2.tab
2:ABCD3 chr1    94883932.0  94944260.0  0.0316227766017 0.260698861451  0.277741584016  0.302602378581  0.4352790705329718  56  16

我认为这里的问题是默认情况下 subprocess.Popen() 运行 是异步的,而您似乎希望它 运行 是同步的。所以实际上,您所有的 headtail 命令都同时 运行ning,直接进入输出文件。

要解决此问题,您可能只想添加 .wait():

import subprocess
outpath = "output.tab"
with open( outpath, "w" ) as outf :
    "write a header"
    if header is True:
        p1 = subprocess.Popen(["head", "-n1", files[-1] ], stdout= outf, )
        p1.wait()  # Pauses the script until the command finishes
    if type(header) is str:
        p1 = subprocess.Popen(["head", "-n1", header ], stdout= outf,)
        p1.wait()
    for fl in files:
        print(  fl )
        p1 = subprocess.Popen(["tail", "-n+2", fl], stdout= outf, )
        p1.wait()