python 子进程 popen:管道 stdout 弄乱了字符串
python subprocess popen: Piping stdout messes up the strings
我正在尝试将几个文件连接在一起并添加 header。
import subprocess
outpath = "output.tab"
with open( outpath, "w" ) as outf :
"write a header"
if header is True:
p1 = subprocess.Popen(["head", "-n1", files[-1] ], stdout= outf, )
if type(header) is str:
p1 = subprocess.Popen(["head", "-n1", header ], stdout= outf,)
for fl in files:
print( fl )
p1 = subprocess.Popen(["tail", "-n+2", fl], stdout= outf, )
出于某种原因,某些文件 (fl
) 仅部分打印,下一个文件从前一个文件的字符串中间开始:
awk '{print NF}' output.tab | uniq -c
108 11
1 14
69 11
1 10
35 11
1 16
250 11
1 16
在Python中有什么方法可以修复它吗?
乱码示例:
$tail -n+108 output.tab | head -n1
CENPA chr2 27008881.0 2701ABCD3 chr1 94883932.0 94944260.0 0.0316227766017 0.260698861451 0.277741584016 0.302602378581 0.4352790705329718 56 16
$grep -n A1 'CENPA' file1.tab
109:CENPA chr2 27008881.0 27017455.0 1.0 0.417081004817 0.0829327365256 0.545205239241 0.7196619496326693 95 3
110-CENPO chr2 25016174.0 25045245.0 1000.0 0.151090930896 -0.0083671250883 0.50882773122 0.0876177652747541 82 0
$grep -n 'ABCD3' file2.tab
2:ABCD3 chr1 94883932.0 94944260.0 0.0316227766017 0.260698861451 0.277741584016 0.302602378581 0.4352790705329718 56 16
我认为这里的问题是默认情况下 subprocess.Popen()
运行 是异步的,而您似乎希望它 运行 是同步的。所以实际上,您所有的 head
和 tail
命令都同时 运行ning,直接进入输出文件。
要解决此问题,您可能只想添加 .wait()
:
import subprocess
outpath = "output.tab"
with open( outpath, "w" ) as outf :
"write a header"
if header is True:
p1 = subprocess.Popen(["head", "-n1", files[-1] ], stdout= outf, )
p1.wait() # Pauses the script until the command finishes
if type(header) is str:
p1 = subprocess.Popen(["head", "-n1", header ], stdout= outf,)
p1.wait()
for fl in files:
print( fl )
p1 = subprocess.Popen(["tail", "-n+2", fl], stdout= outf, )
p1.wait()
我正在尝试将几个文件连接在一起并添加 header。
import subprocess
outpath = "output.tab"
with open( outpath, "w" ) as outf :
"write a header"
if header is True:
p1 = subprocess.Popen(["head", "-n1", files[-1] ], stdout= outf, )
if type(header) is str:
p1 = subprocess.Popen(["head", "-n1", header ], stdout= outf,)
for fl in files:
print( fl )
p1 = subprocess.Popen(["tail", "-n+2", fl], stdout= outf, )
出于某种原因,某些文件 (fl
) 仅部分打印,下一个文件从前一个文件的字符串中间开始:
awk '{print NF}' output.tab | uniq -c
108 11
1 14
69 11
1 10
35 11
1 16
250 11
1 16
在Python中有什么方法可以修复它吗?
乱码示例:
$tail -n+108 output.tab | head -n1
CENPA chr2 27008881.0 2701ABCD3 chr1 94883932.0 94944260.0 0.0316227766017 0.260698861451 0.277741584016 0.302602378581 0.4352790705329718 56 16
$grep -n A1 'CENPA' file1.tab
109:CENPA chr2 27008881.0 27017455.0 1.0 0.417081004817 0.0829327365256 0.545205239241 0.7196619496326693 95 3
110-CENPO chr2 25016174.0 25045245.0 1000.0 0.151090930896 -0.0083671250883 0.50882773122 0.0876177652747541 82 0
$grep -n 'ABCD3' file2.tab
2:ABCD3 chr1 94883932.0 94944260.0 0.0316227766017 0.260698861451 0.277741584016 0.302602378581 0.4352790705329718 56 16
我认为这里的问题是默认情况下 subprocess.Popen()
运行 是异步的,而您似乎希望它 运行 是同步的。所以实际上,您所有的 head
和 tail
命令都同时 运行ning,直接进入输出文件。
要解决此问题,您可能只想添加 .wait()
:
import subprocess
outpath = "output.tab"
with open( outpath, "w" ) as outf :
"write a header"
if header is True:
p1 = subprocess.Popen(["head", "-n1", files[-1] ], stdout= outf, )
p1.wait() # Pauses the script until the command finishes
if type(header) is str:
p1 = subprocess.Popen(["head", "-n1", header ], stdout= outf,)
p1.wait()
for fl in files:
print( fl )
p1 = subprocess.Popen(["tail", "-n+2", fl], stdout= outf, )
p1.wait()