使用另一个文本文件(文本 B)中的列表在文本文件(文本 A)中查找名称,并在文本 A 中的名称旁边分配值 (Python)

Finding names in a text file (Text A) using a list in another text file (Text B) and assign values next to the names in Text A (Python)

我是 Python 语言的新手,我需要你的帮助。

我有 2 个不同的文本文件。假设它们是 Text_A.txt 和 Text_B.txt。

Text_A.txt 包含如下名称列表(它们用制表符分隔):

Sequence_1 Sequence_2 Sequence_3 Sequence_4 Sequence_5 Sequence_6 Sequence_7 Sequence_8

和Text_B.txt包含如下名称列表(每行写有序列名称):

Sequence_1 Sequence_2 Sequence_3 Sequence_4 Sequence_5 Sequence_6 Sequence_7 Sequence_8 Sequence_9 Sequence_10 Sequence_11

如果名称在 Text_A.txt 中,我想做的是在 Text_B.txt 中的序列名称旁边分配“1”。如果名称不在 Text_A.txt.

中,则在 Text_B.txt 中的序列名称旁边分配“0”

所以...使用上面示例的预期输出如下所示(名称和相应的值应写在每一行中):

Sequence_1;1
Sequence_2;1 Sequence_3;1 Sequence_4;1 Sequence_5;1 Sequence_6;1 Sequence_7;1 Sequence_8;1 Sequence_9;0 Sequence_10;0 Sequence_11;0

我想要 .txt 格式的输出。

我应该如何使用 Python 执行此操作?

这里真的需要你的帮助,因为我在 Text_A.txt 和 Text_B.txt 文件中分别有超过 3000 和 6000 个名字。

非常感谢!

您可以执行以下操作

# read each file assuming that your sequence of strings 
# is the first line respectively
with open('Text_A.txt', 'r') as f:
    seqA = f.readline()
with open('Text_B.txt', 'r') as f:
    seqB = f.readline()

# remove end-of-line character
seqA = seqA.strip('\n')
seqB = seqB.strip('\n')

# so far, seqA and seqB are strings. split them now on tabs
seqA = seqA.split('\t')
seqB = seqB.split('\t')

# now, seqA and seqB are list of strings
# since you want to use seqA as a lookup, you should make a set out of seqA
seqA = set( seqA )

# now iterate over each item in seqB and check if it is present in seqA
# store result in a list
out = []
for item in seqB:
    is_present = 1 if item in seqA else 0
    out.append('{item}:{is_presnet}\n'.format(item=item,is_present=is_present))

# write result to file
with open('output.txt','w') as f:
    f.write( '\t'.join( out ) )

如果您的序列包含数百万个条目,您应该考虑更高级的方法。