Python - 从txt文件输入的字符串比较

Python - Comparison of string input from txt file

关于比较 python 中的字符串,我似乎遇到了一个小问题。我正在从一个文本文件中读入,然后一次比较三个字符。它似乎总是认为第一个 "if" 语句是否正确,这让我感到困惑。 (请注意,输入作为测试在循环中打印出来,并给出正确的字符串进行比较)。感谢任何 help/advice :)

文本文件输入:

ACATTTGCTTCTGACACAACTGTGTTCACTAGCAACCTCAAACAGACACCATGGTGCATCTGACTCCTGa GGAGAAGTCTGCCGTTACTGCCCTGTGGGGCAAGGTGAACGTGGATGAAGTTGGTGGTGAGGCCCTGGGC AGGCTGCTGGTGGTCTACCCTTGGACCCAGAGGTTCTTTGAGTCCTTTGGGGATCTGTCCACTCCTGATG CTGTTATGGGCAACCCTAAGGTGAAGGCTCATGGCAAGAAAGTGCTCGGTGCCTTTAGTGATGGCCTGGC TCACCTGGACAACCTCAAGGGCACCTTTGCCACACTGAGTGAGCTGCACTGTGACAAGCTGCACGTGGAT CCTGAGAACTTCAGGCTCCTGGGCAACGTGCTGGTCTGTGTGCTGGCCCATCACTTTGGCAAAGAATTCA CCCCACCAGTGCAGGCTGCCTATCAGAAAGTGGTGGCTGGTGTGGCTAATGCCCTGGCCCACAAGTATCA CTAAGCTCGCTTTCTTGCTGTCCAATTTCTATTAAAGGTTCCTTTGTTCCCTAAGTCCAACTACTAAACT GGGGGATATTATGAAGGGCCTTGAGCATCTGGATTCTGCCTAATAAAAAACATTTATTTTCATTGC

infile = open('DNA.txt', 'r')

while True:
    line = infile.readline()
    if not line: break
    a = []
    for i in range (0, len(line), 3):
        DNA = line[i:i+3]
        print DNA

        if DNA == 'ATT' or 'ATC' or 'ATA':
            a.append('I')

        elif DNA == 'CTT' or 'CTC' or 'CTA' or 'CTG' or 'TTA' or 'TTG':
            a.append('L')

        elif DNA == 'GTT' or 'GTC' or 'GTA' or 'GTG':
            a.append('V')

        elif DNA == 'TTT' or 'TTC':
            a.append('F')

        elif DNA == 'ATG':
            a.append('M')

        else:
            a.append('X')

    print str(a)

输出:

ACA
TTT
GCT
TCT
GAC
ACA
ACT
GTG
TTC
ACT
AGC
AAC
CTC
AAA
CAG
ACA
CCA
TGG
TGC
ATC
TGA
CTC
CTG
a

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

GGA
GAA
GTC
TGC
CGT
TAC
TGC
CCT
GTG
GGG
CAA
GGT
GAA
CGT
GGA
TGA
AGT
TGG
TGG
TGA
GGC
CCT
GGG
C

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

AGG
CTG
CTG
GTG
GTC
TAC
CCT
TGG
ACC
CAG
AGG
TTC
TTT
GAG
TCC
TTT
GGG
GAT
CTG
TCC
ACT
CCT
GAT
G

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

CTG
TTA
TGG
GCA
ACC
CTA
AGG
TGA
AGG
CTC
ATG
GCA
AGA
AAG
TGC
TCG
GTG
CCT
TTA
GTG
ATG
GCC
TGG
C

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

TCA
CCT
GGA
CAA
CCT
CAA
GGG
CAC
CTT
TGC
CAC
ACT
GAG
TGA
GCT
GCA
CTG
TGA
CAA
GCT
GCA
CGT
GGA
T

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

CCT
GAG
AAC
TTC
AGG
CTC
CTG
GGC
AAC
GTG
CTG
GTC
TGT
GTG
CTG
GCC
CAT
CAC
TTT
GGC
AAA
GAA
TTC
A

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

CCC
CAC
CAG
TGC
AGG
CTG
CCT
ATC
AGA
AAG
TGG
TGG
CTG
GTG
TGG
CTA
ATG
CCC
TGG
CCC
ACA
AGT
ATC
A

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

CTA
AGC
TCG
CTT
TCT
TGC
TGT
CCA
ATT
TCT
ATT
AAA
GGT
TCC
TTT
GTT
CCC
TAA
GTC
CAA
CTA
CTA
AAC
T

['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

GGG
GGA
TAT
TAT
GAA
GGG
CCT
TGA
GCA
TCT
GGA
TTC
TGC
CTA
ATA
AAA
AAC
ATT
TAT
TTT
CAT
TGC
['I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I', 'I']

它的计算结果总是 I 因为

if DNA == 'ATT' or 'ATC' or 'ATA':

始终计算为 True

相当于:

if (DNA == 'ATT') or ('ATC') or ('ATA'):

'ATC' 的真值总是 真,因此结果。

您可以这样检查:

if DNA in ['ATT', 'ATC', 'ATA']:

其他 if 条款也是如此。


此外,请注意所有这些逻辑:

infile = open('DNA.txt', 'r')

while True:
    line = infile.readline()
    if not line: break

可以替换为

with open('DNA.txt', 'r')  as infile:
    for line in infile:

此外,另一种方法是使用字典映射和查找。这样,您就可以简化所有 if 逻辑。示例:

dna_dict = {
    'ATT': 'I',
    'ATC': 'I',
    'ATA': 'I',
    ....
}

然后:

a.append(dna_dict.get(DNA, 'X'))

这种方式可读性更高

with open('file.txt') as f:
    data = f.readlines()

for line in data:
    if not line:
        continue
    a = []
    segment = [line[i:i+3] for i in range(0, len(line), 3)]
    for dna in segment:
        if dna in ['ATT', 'ATC', 'ATA']:
            a.append('I')
        elif dna in ['CTT', 'CTC', 'CTA', 'CTG', 'TTA', 'TTG']:
            a.append('L')
        elif dna in ['GTT', 'GTC', 'GTA', 'GTG']:
            a.append('V')
        elif dna in ['TTT', 'TTC']:
            a.append('F')
        elif dna in ['ATG']:
            a.append('M')
        else:
            a.append('X')
    print a