使用另一个文本文件(文本 B)中的列表在文本文件(文本 A)中查找名称,并在文本 A 中的名称旁边分配值 (Python)
Finding names in a text file (Text A) using a list in another text file (Text B) and assign values next to the names in Text A (Python)
我是 Python 语言的新手,我需要你的帮助。
我有 2 个不同的文本文件。假设它们是 Text_A.txt 和 Text_B.txt。
Text_A.txt 包含如下名称列表(它们用制表符分隔):
Sequence_1 Sequence_2 Sequence_3 Sequence_4
Sequence_5 Sequence_6 Sequence_7 Sequence_8
和Text_B.txt包含如下名称列表(每行写有序列名称):
Sequence_1
Sequence_2
Sequence_3
Sequence_4
Sequence_5
Sequence_6
Sequence_7
Sequence_8
Sequence_9
Sequence_10
Sequence_11
如果名称在 Text_A.txt 中,我想做的是在 Text_B.txt 中的序列名称旁边分配“1”。如果名称不在 Text_A.txt.
中,则在 Text_B.txt 中的序列名称旁边分配“0”
所以...使用上面示例的预期输出如下所示(名称和相应的值应写在每一行中):
Sequence_1;1
Sequence_2;1
Sequence_3;1
Sequence_4;1
Sequence_5;1
Sequence_6;1
Sequence_7;1
Sequence_8;1
Sequence_9;0
Sequence_10;0
Sequence_11;0
我想要 .txt 格式的输出。
我应该如何使用 Python 执行此操作?
这里真的需要你的帮助,因为我在 Text_A.txt 和 Text_B.txt 文件中分别有超过 3000 和 6000 个名字。
非常感谢!
您可以执行以下操作
# read each file assuming that your sequence of strings
# is the first line respectively
with open('Text_A.txt', 'r') as f:
seqA = f.readline()
with open('Text_B.txt', 'r') as f:
seqB = f.readline()
# remove end-of-line character
seqA = seqA.strip('\n')
seqB = seqB.strip('\n')
# so far, seqA and seqB are strings. split them now on tabs
seqA = seqA.split('\t')
seqB = seqB.split('\t')
# now, seqA and seqB are list of strings
# since you want to use seqA as a lookup, you should make a set out of seqA
seqA = set( seqA )
# now iterate over each item in seqB and check if it is present in seqA
# store result in a list
out = []
for item in seqB:
is_present = 1 if item in seqA else 0
out.append('{item}:{is_presnet}\n'.format(item=item,is_present=is_present))
# write result to file
with open('output.txt','w') as f:
f.write( '\t'.join( out ) )
如果您的序列包含数百万个条目,您应该考虑更高级的方法。
我是 Python 语言的新手,我需要你的帮助。
我有 2 个不同的文本文件。假设它们是 Text_A.txt 和 Text_B.txt。
Text_A.txt 包含如下名称列表(它们用制表符分隔):
Sequence_1 Sequence_2 Sequence_3 Sequence_4 Sequence_5 Sequence_6 Sequence_7 Sequence_8
和Text_B.txt包含如下名称列表(每行写有序列名称):
Sequence_1 Sequence_2 Sequence_3 Sequence_4 Sequence_5 Sequence_6 Sequence_7 Sequence_8 Sequence_9 Sequence_10 Sequence_11
如果名称在 Text_A.txt 中,我想做的是在 Text_B.txt 中的序列名称旁边分配“1”。如果名称不在 Text_A.txt.
中,则在 Text_B.txt 中的序列名称旁边分配“0”所以...使用上面示例的预期输出如下所示(名称和相应的值应写在每一行中):
Sequence_1;1
Sequence_2;1
Sequence_3;1
Sequence_4;1
Sequence_5;1
Sequence_6;1
Sequence_7;1
Sequence_8;1
Sequence_9;0
Sequence_10;0
Sequence_11;0
我想要 .txt 格式的输出。
我应该如何使用 Python 执行此操作?
这里真的需要你的帮助,因为我在 Text_A.txt 和 Text_B.txt 文件中分别有超过 3000 和 6000 个名字。
非常感谢!
您可以执行以下操作
# read each file assuming that your sequence of strings
# is the first line respectively
with open('Text_A.txt', 'r') as f:
seqA = f.readline()
with open('Text_B.txt', 'r') as f:
seqB = f.readline()
# remove end-of-line character
seqA = seqA.strip('\n')
seqB = seqB.strip('\n')
# so far, seqA and seqB are strings. split them now on tabs
seqA = seqA.split('\t')
seqB = seqB.split('\t')
# now, seqA and seqB are list of strings
# since you want to use seqA as a lookup, you should make a set out of seqA
seqA = set( seqA )
# now iterate over each item in seqB and check if it is present in seqA
# store result in a list
out = []
for item in seqB:
is_present = 1 if item in seqA else 0
out.append('{item}:{is_presnet}\n'.format(item=item,is_present=is_present))
# write result to file
with open('output.txt','w') as f:
f.write( '\t'.join( out ) )
如果您的序列包含数百万个条目,您应该考虑更高级的方法。