使 biopython Entrez.esearch 循环遍历参数
Make biopython Entrez.esearch loop through parameters
我正在尝试改编一个脚本(在此处找到:https://gist.github.com/bonzanini/5a4c39e4c02502a8451d)以从 PubMed 搜索和检索数据。
这是我目前的情况:
#!/usr/bin/env python
from Bio import Entrez
import datetime
import json
# Create dictionary of journals (the official abbreviations are not used here...)
GroupA=["Nature", "Science", "PNAS","JACS"]
GroupB=["E-life", "Mol Cell","Plos Computational","Nature communication","Cell"]
GroupC=["Nature Biotech", "Nature Chem Bio", "Nature Str Bio", "Nature Methods"]
Journals = {}
for item in GroupA:
Journals.setdefault("A",[]).append(item)
for item in GroupB:
Journals.setdefault("B",[]).append(item)
for item in GroupC:
Journals.setdefault("C",[]).append(item)
# Set dates for search
today = datetime.datetime.today()
numdays = 15
dateList = []
for x in range (0, numdays):
dateList.append(today - datetime.timedelta(days = x))
dateList[1:numdays-1] = []
today = dateList[0].strftime("%Y/%m/%d")
lastdate = dateList[1].strftime("%Y/%m/%d")
print 'Retreiving data from ' '%s to %s' % (lastdate,today)
for value in Journals['A']:
Entrez.email = "email"
handle = Entrez.esearch(db="pubmed",term="gpcr[TI] AND value[TA]",
sort="pubdate",retmax="10",retmode="xml",datetype="pdat",mindate=lastdate,maxdate=today)
record = Entrez.read(handle)
print(record["IdList"])
我想使用 for 循环的每个 "value"(在本例中为期刊标题)作为 Entrez.search 函数的参数。这个没有内置参数,所以它必须在术语参数内,但它并不像所示的那样工作。
一旦我有了 ID 列表,我就会使用 Entrez.fetch 来检索和打印我想要的数据,但这是另一个问题...
我希望这已经足够清楚了,这是我的第一个问题!谢谢!
如果我没理解错的话,我想这就是你要找的:
term="gpcr[TI] AND {}[TA]".format(value)
使用这个,每个 term
将是:
"gpcr[TI] AND Nature[TA]"
"gpcr[TI] AND Science[TA]"
"gpcr[TI] AND PNAS[TA]"
"gpcr[TI] AND JACS[TA]"
我正在尝试改编一个脚本(在此处找到:https://gist.github.com/bonzanini/5a4c39e4c02502a8451d)以从 PubMed 搜索和检索数据。
这是我目前的情况:
#!/usr/bin/env python
from Bio import Entrez
import datetime
import json
# Create dictionary of journals (the official abbreviations are not used here...)
GroupA=["Nature", "Science", "PNAS","JACS"]
GroupB=["E-life", "Mol Cell","Plos Computational","Nature communication","Cell"]
GroupC=["Nature Biotech", "Nature Chem Bio", "Nature Str Bio", "Nature Methods"]
Journals = {}
for item in GroupA:
Journals.setdefault("A",[]).append(item)
for item in GroupB:
Journals.setdefault("B",[]).append(item)
for item in GroupC:
Journals.setdefault("C",[]).append(item)
# Set dates for search
today = datetime.datetime.today()
numdays = 15
dateList = []
for x in range (0, numdays):
dateList.append(today - datetime.timedelta(days = x))
dateList[1:numdays-1] = []
today = dateList[0].strftime("%Y/%m/%d")
lastdate = dateList[1].strftime("%Y/%m/%d")
print 'Retreiving data from ' '%s to %s' % (lastdate,today)
for value in Journals['A']:
Entrez.email = "email"
handle = Entrez.esearch(db="pubmed",term="gpcr[TI] AND value[TA]",
sort="pubdate",retmax="10",retmode="xml",datetype="pdat",mindate=lastdate,maxdate=today)
record = Entrez.read(handle)
print(record["IdList"])
我想使用 for 循环的每个 "value"(在本例中为期刊标题)作为 Entrez.search 函数的参数。这个没有内置参数,所以它必须在术语参数内,但它并不像所示的那样工作。
一旦我有了 ID 列表,我就会使用 Entrez.fetch 来检索和打印我想要的数据,但这是另一个问题...
我希望这已经足够清楚了,这是我的第一个问题!谢谢!
如果我没理解错的话,我想这就是你要找的:
term="gpcr[TI] AND {}[TA]".format(value)
使用这个,每个 term
将是:
"gpcr[TI] AND Nature[TA]"
"gpcr[TI] AND Science[TA]"
"gpcr[TI] AND PNAS[TA]"
"gpcr[TI] AND JACS[TA]"