Entrez 和 SeqIO "no records found in handle"
Entrez and SeqIO "no records found in handle"
我的代码如下所示:
import re
from Bio import SeqIO
from Bio import Entrez
Entrez.email = "...@..." # My e-mail address
handle1 = Entrez.efetch(db="pubmed", id=pmid_list_2010, rettype="gb", retmode="text")
data1 = handle1.read()
handle1.close()
handle2 = Entrez.efetch(db="pubmed", id=pmid_list_2011, rettype="gb", retmode="text")
data2 = handle2.read()
handle2.close()
handle3 = Entrez.efetch(db="pubmed", id=pmid_list_2012, rettype="gb", retmode="text")
data3 = handle3.read()
handle3.close()
handle4 = Entrez.efetch(db="pubmed", id=pmid_list_2013, rettype="gb", retmode="text")
data4 = handle4.read()
handle4.close()
handle5 = Entrez.efetch(db="pubmed", id=pmid_list_2014, rettype="gb", retmode="text")
data5 = handle5.read()
handle5.close()
handle6 = Entrez.efetch(db="pubmed", id=pmid_list_2015, rettype="gb", retmode="text")
data6 = handle6.read()
handle6.close()
out_handle = open("test2.gb", "w")
out_handle.write(data1)
out_handle.write(data2)
out_handle.write(data3)
out_handle.write(data4)
out_handle.write(data5)
out_handle.write(data6)
out_handle.close()
in_handle = open("test2.gb", "r")
record = SeqIO.read(in_handle,"genbank")
in_handle.close()
倒数第二行给我这个错误:
ValueError: No records found in handle
我的文件看起来不错 - 它不是空的或任何东西。有很多记录,据我所知,它的格式是正确的。我到底做错了什么?
我注意到这适用于其他数据库 - 例如 "nuceleotide"。这是 Pubmed 的问题吗?这需要不同的格式吗?谢谢
您正在尝试解析错误的格式。当您查询 "pubmed" 数据库时,您只会收到 rettypes medline、uilist 或 abstract。然而你要求 Genbank 重新输入,这在这种情况下没有意义。
相反,您可以使用 Medline 解析器:
from Bio import Medline
h1 = Entrez.efetch(db="pubmed",
id=["26837606"],
rettype="medline",
retmode="text")
for record in Medline.parse(h1):
print(record["TI"])
产出
Exploiting the CRISPR/Cas9 System for Targeted Genome Mutagenesis in Petunia.
我的代码如下所示:
import re
from Bio import SeqIO
from Bio import Entrez
Entrez.email = "...@..." # My e-mail address
handle1 = Entrez.efetch(db="pubmed", id=pmid_list_2010, rettype="gb", retmode="text")
data1 = handle1.read()
handle1.close()
handle2 = Entrez.efetch(db="pubmed", id=pmid_list_2011, rettype="gb", retmode="text")
data2 = handle2.read()
handle2.close()
handle3 = Entrez.efetch(db="pubmed", id=pmid_list_2012, rettype="gb", retmode="text")
data3 = handle3.read()
handle3.close()
handle4 = Entrez.efetch(db="pubmed", id=pmid_list_2013, rettype="gb", retmode="text")
data4 = handle4.read()
handle4.close()
handle5 = Entrez.efetch(db="pubmed", id=pmid_list_2014, rettype="gb", retmode="text")
data5 = handle5.read()
handle5.close()
handle6 = Entrez.efetch(db="pubmed", id=pmid_list_2015, rettype="gb", retmode="text")
data6 = handle6.read()
handle6.close()
out_handle = open("test2.gb", "w")
out_handle.write(data1)
out_handle.write(data2)
out_handle.write(data3)
out_handle.write(data4)
out_handle.write(data5)
out_handle.write(data6)
out_handle.close()
in_handle = open("test2.gb", "r")
record = SeqIO.read(in_handle,"genbank")
in_handle.close()
倒数第二行给我这个错误:
ValueError: No records found in handle
我的文件看起来不错 - 它不是空的或任何东西。有很多记录,据我所知,它的格式是正确的。我到底做错了什么?
我注意到这适用于其他数据库 - 例如 "nuceleotide"。这是 Pubmed 的问题吗?这需要不同的格式吗?谢谢
您正在尝试解析错误的格式。当您查询 "pubmed" 数据库时,您只会收到 rettypes medline、uilist 或 abstract。然而你要求 Genbank 重新输入,这在这种情况下没有意义。
相反,您可以使用 Medline 解析器:
from Bio import Medline
h1 = Entrez.efetch(db="pubmed",
id=["26837606"],
rettype="medline",
retmode="text")
for record in Medline.parse(h1):
print(record["TI"])
产出
Exploiting the CRISPR/Cas9 System for Targeted Genome Mutagenesis in Petunia.