从 FASTA 文件创建字典

Question

我有一个如下所示的文件：

%Labelinfo

string1

string2

%Labelinfo2

string3

string4

string5

我想创建一个字典，它的键是 %Labelinfo 字符串，值是从一个 Labelinfo 到下一个 Labelinfo 的字符串串联。基本上是这样的：

{%Labelinfo：string1+string2，%Labelinfo：string2+string3+string4}

问题是两个“Labelinfo”行之间可以有任意数量的行。例如，%Labelinfo 到 %Labelinfo2 之间可以有 5 行。然后，在 %Labelinfo2 到 %Labelinfo3 之间可以，比方说 4 行。

但是，包含“Labelinfo”的行始终以相同的字符开头，例如 %。

如何解决这个问题？

Answer 1

我会这样写：

程序遍历文件中的每一行。检查该行是否为空，如果是，则忽略它。如果它不为空，那么我们处理该行。任何以 % 开头的都表示一个变量，所以让我们继续将其添加到字典并将其设置为一个变量 current。然后我们继续在键 current 处添加到字典中，直到下一个 %

di = {}
with open("fasta.txt","r") as f:
    current = ""
    for line in f:
        line = line.strip()
        if line == "":
            continue
        if line[0] == "%":
            di[line] = ""
            current = line
        else:
            if di[current] == "":
                di[current] = line
            else:
                di[current] += "+" + line
print(di)

输出：

{'%Labelinfo2': 'string3+string4+string5', '%Labelinfo': 'string1+string2'}

注意：字典不会强制错误，所以它们会乱序；但仍然可以以相同的方式访问。而且，请注意，您的示例输出有点错误，您忘记在 %Labelinfo.

之一之后输入 2

Answer 2

重新导入

d = {}

text = open('fasta.txt').read()

for el in [ x for x in re.split(r'\s+', text) if x]:

if el.startswith('%'):
    key = el
    d[key] = ''
else:
    value = d[key] + el
    d[key] = value

打印(d)

{'%Labelinfo': 'string1string2', '%Labelinfo2': 'string3string4string5'}

Answer 3

#!/usr/bin/env python
# coding:utf-8
'''黄哥Python'''

d = {}

with open('Labelinfo.txt') as f:
    for line in f:
        if len(line) > 1:
            if '%Labelinf' in line:
                key = line.strip()
                d[key] = ""
            else:
                d[key] += line.strip() + "+"

d = {key: d[key][:-1] for key in d}
print d

{'%Labelinfo2': 'string3+string4+string5', '%Labelinfo': 'string1+string2'}

从 FASTA 文件创建字典

Creating a dictionary from FASTA file

python

fasta