编译单独的文件

Compile separate files

如果有三个文件:

File1
    >TAIR:175_a
     ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
    >TAIR:175_b
     ZZZLAALSKDJFALKSDJFL;KJEIURALKDJFNVALKSDJFKZZZ
    >TAIR:175_c
     ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF

File2
    >TAIR:674_a
     ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
    >TAIR:674_b
     ASLALKSDGHDJGDGSDDFIEURALKSDHGLANVALKSDJGHKLJA

File3
    >TAIR:812_a
     KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
    >TAIR:812_c
     ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA

File4
    >TAIR:975_b
     KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ

File5
    >TAIR:444_b
     QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
    >TAIR:444_c
     QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA

我写这段代码来提取目录中所有序列的名称:

#!/usr/bin/env python
from Bio import SeqIO
filenames = ["file1","file2","file3"]
ids = []

for record in filenames:
    f = SeqIO.parse(record, 'fasta')
    ids.append(f.id)

print ids

输出是这样的:

 python search_list.py 
[<generator object parse at 0x7f32836018c0>, <generator object parse at 0x7f3283601910>, <generator object parse at 0x7f3283601960>]

我期望的输出是:

file_a
    >TAIR:175_a
     ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
    >TAIR:674_a
     ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA

file_b
    >TAIR:175_b
     ZZZLAALSKDJFALKSDJFL;KJEIURALKDJFNVALKSDJFKZZZ
    >TAIR:674_b
     ASLALKSDGHDJGDGSDDFIEURALKSDHGLANVALKSDJGHKLJA
    >TAIR:975_b
     KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
    >TAIR:444_b
     QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA

file_c
    >TAIR:175_c
     ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
    >TAIR:812_c
     ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
    >TAIR:444_c
     QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA

打开列表 "ids" 中的文件并编译它们有什么解决这个问题的建议吗?

你得到那个输出是因为你要求 python 打印一个对象,所以默认情况下它只是打印内存地址而不是内容。 您最好只使用标准 python 打开方法(遍历要检查的文件列表)。然后您可以遍历文件中的每一行并将其添加到列表或任何您喜欢的内容中。让我知道示例是否有帮助。

(忽略打印括号问题,)您的代码在我的系统(Python 3.6.0;Biopython 1.69)上中断:

AttributeError: 'generator' object has no attribute 'id'

作为 SeqIO.parse() returns 生成器。另外你的 "output that I expect" 是完全错误的。鉴于此代码,您期望的是:

['TAIR:175_a', 'TAIR:674_a', 'TAIR:812_a', 'TAIR:975_b', 'TAIR:175_b', 'TAIR:444_b', 'TAIR:175_c', 'TAIR:444_c']

在我的环境中,以下代码将为您获取它:

from Bio import SeqIO

filenames = ["file1.fasta", "file2.fasta", "file3.fasta"]

ids = []

for filename in filenames:
    records = SeqIO.parse(filename, 'fasta')

    for record in records:
        ids.append(record.id)

print(ids)