使用 Python 根据文本文件中的值提取行
Extract rows based on values from text file using Python
我在文件 A 中有一个信息列表,我想根据文件 B 中的编号提取这些信息。如果给定值 4 和 5,则文件 A 中值为 4 和 5 的所有第 4 列将是提取。我可以知道如何使用 python 执行此操作吗?谁能帮我?下面的代码仅根据值为 4 的索引进行提取。
with open("B.txt", "rt") as f:
classes = [int(line) for line in f.readlines()]
with open("A.txt", "rt") as f:
lines = [line for index, line in enumerate(f.readlines()) if classes[index]== 4]
lines_all= "".join(lines)
with open("C.txt", "w") as f:
f.write(lines_all)
A.txt
hg17_ct_ER_ER_1003 36 42 1
hg17_ct_ER_ER_1003 109 129 2
hg17_ct_ER_ER_1003 110 130 2
hg17_ct_ER_ER_1003 129 149 2
hg17_ct_ER_ER_1003 130 150 2
hg17_ct_ER_ER_1003 157 163 3
hg17_ct_ER_ER_1003 157 165 3
hg17_ct_ER_ER_1003 179 185 4
hg17_ct_ER_ER_1003 197 217 5
hg17_ct_ER_ER_1003 220 226 6
B.txt
4
5
期望输出
hg17_ct_ER_ER_1003 179 185 4
hg17_ct_ER_ER_1003 197 217 5
从 b 文件中创建一组 lines/numbers 并将 f1 中每一行的最后一个元素与集合中的元素进行比较:
import csv
with open("a.txt") as f, open("b.txt") as f2:
st = set(line.rstrip() for line in f2)
r = csv.reader(f,delimiter=" ")
data = [row for row in r if row[-1] in st]
print(data)
[['hg17_ct_ER_ER_1003', '179', '185', '4'], ['hg17_ct_ER_ER_1003', '197', '217', '5']]
将 delimiter=
设置为任何值,或者如果您的文件以逗号分隔则根本不设置。
或者:
with open("a.txt") as f, open("b.txt") as f2:
st = set(line.rstrip() for line in f2)
data = [line.rstrip() for line in f if line.rsplit(None, 1)[1] in st ]
print(data)
['hg17_ct_ER_ER_1003 179 185 4', 'hg17_ct_ER_ER_1003 197 217 5']
with open("B.txt", "r") as target_file:
target = [i.strip() for i in target_file]
with open("A.txt", "r") as data_file:
r = filter(lambda x: x.strip().rsplit(None, 1)[1] in target, data_file)
print "".join(r)
输出:
hg17_ct_ER_ER_1003 179 185 4
hg17_ct_ER_ER_1003 197 217 5
作为@Padraic 的,我将split()[-1]
更改为rsplit(None, 1)[1]
。
我在文件 A 中有一个信息列表,我想根据文件 B 中的编号提取这些信息。如果给定值 4 和 5,则文件 A 中值为 4 和 5 的所有第 4 列将是提取。我可以知道如何使用 python 执行此操作吗?谁能帮我?下面的代码仅根据值为 4 的索引进行提取。
with open("B.txt", "rt") as f:
classes = [int(line) for line in f.readlines()]
with open("A.txt", "rt") as f:
lines = [line for index, line in enumerate(f.readlines()) if classes[index]== 4]
lines_all= "".join(lines)
with open("C.txt", "w") as f:
f.write(lines_all)
A.txt
hg17_ct_ER_ER_1003 36 42 1
hg17_ct_ER_ER_1003 109 129 2
hg17_ct_ER_ER_1003 110 130 2
hg17_ct_ER_ER_1003 129 149 2
hg17_ct_ER_ER_1003 130 150 2
hg17_ct_ER_ER_1003 157 163 3
hg17_ct_ER_ER_1003 157 165 3
hg17_ct_ER_ER_1003 179 185 4
hg17_ct_ER_ER_1003 197 217 5
hg17_ct_ER_ER_1003 220 226 6
B.txt
4
5
期望输出
hg17_ct_ER_ER_1003 179 185 4
hg17_ct_ER_ER_1003 197 217 5
从 b 文件中创建一组 lines/numbers 并将 f1 中每一行的最后一个元素与集合中的元素进行比较:
import csv
with open("a.txt") as f, open("b.txt") as f2:
st = set(line.rstrip() for line in f2)
r = csv.reader(f,delimiter=" ")
data = [row for row in r if row[-1] in st]
print(data)
[['hg17_ct_ER_ER_1003', '179', '185', '4'], ['hg17_ct_ER_ER_1003', '197', '217', '5']]
将 delimiter=
设置为任何值,或者如果您的文件以逗号分隔则根本不设置。
或者:
with open("a.txt") as f, open("b.txt") as f2:
st = set(line.rstrip() for line in f2)
data = [line.rstrip() for line in f if line.rsplit(None, 1)[1] in st ]
print(data)
['hg17_ct_ER_ER_1003 179 185 4', 'hg17_ct_ER_ER_1003 197 217 5']
with open("B.txt", "r") as target_file:
target = [i.strip() for i in target_file]
with open("A.txt", "r") as data_file:
r = filter(lambda x: x.strip().rsplit(None, 1)[1] in target, data_file)
print "".join(r)
输出:
hg17_ct_ER_ER_1003 179 185 4
hg17_ct_ER_ER_1003 197 217 5
作为@Padraic 的split()[-1]
更改为rsplit(None, 1)[1]
。