Python: 从 .seg 文件中提取数据
Python: Extracting data from a .seg file
我有这个 .seg
文件,我需要根据簇号从中提取第 3 列和第 4 列中的值,例如S0
。
;; cluster S0
khatija-ankle 1 0 184 F S U S0
;; cluster S1
khatija-ankle 1 407 291 F S U S1
khatija-ankle 1 790 473 F S U S1
khatija-ankle 1 1314 248 F S U S1
khatija-ankle 1 1663 187 F S U S1
到目前为止,这是我的代码:
file1 = open('f1.seg', "w")
file2 = open('f2.seg', "w")
with open('ankle.seg','r') as f:
for line in f:
for word in line.split():
if word == 'S0':
file1.write(word)
elif word == 'S1':
file2.write(word)
如何创建每个簇的文件并在其中写入第 3 列和第 4 列?
Question: How do I create a file of each cluster and write the 3rd and 4th columns in it?
而不是比较单列值if word == 'S0':
,检查哪个簇 id 具有一行数据的最后一列。
例如:
# Create a list of column values
data = line.rstrip().split()
# Condition: last value in data == cluster id
if data[-1] == 'S0':
# write to S0 file
print("file1.write({})".format(data[2:4]))
elif data[-1] == 'S1':
# write to S1 file
print("file2.write({})".format(data[2:4]))
Output:
file1.write(['S0'])
file1.write(['0', '184'])
file2.write(['S1'])
file2.write(['407', '291'])
file2.write(['790', '473'])
file2.write(['1314', '248'])
file2.write(['1663', '187'])
使用 Python 测试:3.4.2
虽然这当然可以在 Python 中完成,但它完美地说明了为什么 awk 非常适合削减文本文件:
#! /usr/bin/awk -f
/^;;/ {
filename = ".seg"
next
}
{ print , > filename }
输出:
$ tail *.seg
==> S0.seg <==
0 184
==> S1.seg <==
407 291
790 473
1314 248
1663 187
我有这个 .seg
文件,我需要根据簇号从中提取第 3 列和第 4 列中的值,例如S0
。
;; cluster S0
khatija-ankle 1 0 184 F S U S0
;; cluster S1
khatija-ankle 1 407 291 F S U S1
khatija-ankle 1 790 473 F S U S1
khatija-ankle 1 1314 248 F S U S1
khatija-ankle 1 1663 187 F S U S1
到目前为止,这是我的代码:
file1 = open('f1.seg', "w")
file2 = open('f2.seg', "w")
with open('ankle.seg','r') as f:
for line in f:
for word in line.split():
if word == 'S0':
file1.write(word)
elif word == 'S1':
file2.write(word)
如何创建每个簇的文件并在其中写入第 3 列和第 4 列?
Question: How do I create a file of each cluster and write the 3rd and 4th columns in it?
而不是比较单列值if word == 'S0':
,检查哪个簇 id 具有一行数据的最后一列。
例如:
# Create a list of column values
data = line.rstrip().split()
# Condition: last value in data == cluster id
if data[-1] == 'S0':
# write to S0 file
print("file1.write({})".format(data[2:4]))
elif data[-1] == 'S1':
# write to S1 file
print("file2.write({})".format(data[2:4]))
Output:
file1.write(['S0']) file1.write(['0', '184']) file2.write(['S1']) file2.write(['407', '291']) file2.write(['790', '473']) file2.write(['1314', '248']) file2.write(['1663', '187'])
使用 Python 测试:3.4.2
虽然这当然可以在 Python 中完成,但它完美地说明了为什么 awk 非常适合削减文本文件:
#! /usr/bin/awk -f
/^;;/ {
filename = ".seg"
next
}
{ print , > filename }
输出:
$ tail *.seg
==> S0.seg <==
0 184
==> S1.seg <==
407 291
790 473
1314 248
1663 187