编译单独的文件
Compile separate files
如果有三个文件:
File1
>TAIR:175_a
ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
>TAIR:175_b
ZZZLAALSKDJFALKSDJFL;KJEIURALKDJFNVALKSDJFKZZZ
>TAIR:175_c
ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
File2
>TAIR:674_a
ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
>TAIR:674_b
ASLALKSDGHDJGDGSDDFIEURALKSDHGLANVALKSDJGHKLJA
File3
>TAIR:812_a
KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
>TAIR:812_c
ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
File4
>TAIR:975_b
KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
File5
>TAIR:444_b
QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
>TAIR:444_c
QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
我写这段代码来提取目录中所有序列的名称:
#!/usr/bin/env python
from Bio import SeqIO
filenames = ["file1","file2","file3"]
ids = []
for record in filenames:
f = SeqIO.parse(record, 'fasta')
ids.append(f.id)
print ids
输出是这样的:
python search_list.py
[<generator object parse at 0x7f32836018c0>, <generator object parse at 0x7f3283601910>, <generator object parse at 0x7f3283601960>]
我期望的输出是:
file_a
>TAIR:175_a
ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
>TAIR:674_a
ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
file_b
>TAIR:175_b
ZZZLAALSKDJFALKSDJFL;KJEIURALKDJFNVALKSDJFKZZZ
>TAIR:674_b
ASLALKSDGHDJGDGSDDFIEURALKSDHGLANVALKSDJGHKLJA
>TAIR:975_b
KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
>TAIR:444_b
QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
file_c
>TAIR:175_c
ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
>TAIR:812_c
ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
>TAIR:444_c
QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
打开列表 "ids" 中的文件并编译它们有什么解决这个问题的建议吗?
你得到那个输出是因为你要求 python 打印一个对象,所以默认情况下它只是打印内存地址而不是内容。
您最好只使用标准 python 打开方法(遍历要检查的文件列表)。然后您可以遍历文件中的每一行并将其添加到列表或任何您喜欢的内容中。让我知道示例是否有帮助。
(忽略打印括号问题,)您的代码在我的系统(Python 3.6.0;Biopython 1.69)上中断:
AttributeError: 'generator' object has no attribute 'id'
作为 SeqIO.parse()
returns 生成器。另外你的 "output that I expect" 是完全错误的。鉴于此代码,您期望的是:
['TAIR:175_a', 'TAIR:674_a', 'TAIR:812_a', 'TAIR:975_b', 'TAIR:175_b', 'TAIR:444_b', 'TAIR:175_c', 'TAIR:444_c']
在我的环境中,以下代码将为您获取它:
from Bio import SeqIO
filenames = ["file1.fasta", "file2.fasta", "file3.fasta"]
ids = []
for filename in filenames:
records = SeqIO.parse(filename, 'fasta')
for record in records:
ids.append(record.id)
print(ids)
如果有三个文件:
File1
>TAIR:175_a
ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
>TAIR:175_b
ZZZLAALSKDJFALKSDJFL;KJEIURALKDJFNVALKSDJFKZZZ
>TAIR:175_c
ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
File2
>TAIR:674_a
ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
>TAIR:674_b
ASLALKSDGHDJGDGSDDFIEURALKSDHGLANVALKSDJGHKLJA
File3
>TAIR:812_a
KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
>TAIR:812_c
ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
File4
>TAIR:975_b
KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
File5
>TAIR:444_b
QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
>TAIR:444_c
QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
我写这段代码来提取目录中所有序列的名称:
#!/usr/bin/env python
from Bio import SeqIO
filenames = ["file1","file2","file3"]
ids = []
for record in filenames:
f = SeqIO.parse(record, 'fasta')
ids.append(f.id)
print ids
输出是这样的:
python search_list.py
[<generator object parse at 0x7f32836018c0>, <generator object parse at 0x7f3283601910>, <generator object parse at 0x7f3283601960>]
我期望的输出是:
file_a
>TAIR:175_a
ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
>TAIR:674_a
ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
file_b
>TAIR:175_b
ZZZLAALSKDJFALKSDJFL;KJEIURALKDJFNVALKSDJFKZZZ
>TAIR:674_b
ASLALKSDGHDJGDGSDDFIEURALKSDHGLANVALKSDJGHKLJA
>TAIR:975_b
KLJALSKDHGLAKSDHJFIEUROWASDLKGNIEASDFJKWERLJKJ
>TAIR:444_b
QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
file_c
>TAIR:175_c
ALSKDJFLKAHGLKASJDFLAKJSDLKGHALKSDHGALKALKSJDF
>TAIR:812_c
ASLALKSDGHLA;KSJDFIEURALKSDHGLANVALKSDJGHKLJA
>TAIR:444_c
QQALKSDJFWOIAOQIWUERTOIUQTOIUOQIWEURLASKDJFA
打开列表 "ids" 中的文件并编译它们有什么解决这个问题的建议吗?
你得到那个输出是因为你要求 python 打印一个对象,所以默认情况下它只是打印内存地址而不是内容。 您最好只使用标准 python 打开方法(遍历要检查的文件列表)。然后您可以遍历文件中的每一行并将其添加到列表或任何您喜欢的内容中。让我知道示例是否有帮助。
(忽略打印括号问题,)您的代码在我的系统(Python 3.6.0;Biopython 1.69)上中断:
AttributeError: 'generator' object has no attribute 'id'
作为 SeqIO.parse()
returns 生成器。另外你的 "output that I expect" 是完全错误的。鉴于此代码,您期望的是:
['TAIR:175_a', 'TAIR:674_a', 'TAIR:812_a', 'TAIR:975_b', 'TAIR:175_b', 'TAIR:444_b', 'TAIR:175_c', 'TAIR:444_c']
在我的环境中,以下代码将为您获取它:
from Bio import SeqIO
filenames = ["file1.fasta", "file2.fasta", "file3.fasta"]
ids = []
for filename in filenames:
records = SeqIO.parse(filename, 'fasta')
for record in records:
ids.append(record.id)
print(ids)