rentrez 错误 HTTP 失败:下载 >1/3 的记录时为 400
rentrez error HTTP failure: 400 when downloading >1/3 of records
我有一个奇怪的情况。我正在使用 rentrez
挖掘 PubMed 数据。当我 运行 entrez_search()
然后 entrez_summary()
然后 entrez_fetch()
我收到此错误消息(post 底部的完整代码):
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
<ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_51629226_130.14.18.34_9001_1531773486_1795859931_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
<ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC ---
</ERROR>
</eFetchResult>
四处搜索后,我认为我已经在 this discussion 的查询大小中找到了解决方案。当我将 retmax_set
从 500 减少到 10 时,代码起作用了。然后,我反复确定不会引发错误的最大 retmax_set
值,并发现在我看来非常奇怪的行为。
搜索 term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
产生 552 条记录。当 运行 使用不同的 retmax
:
值设置我的代码时
- 设置
retmax_set
<= 183 作品
- 设置
retmax_set
>= 184 给出上述错误
修改后的搜索 term_set = "transcription AND enhancer AND promoter AND 2018[PDAT]"
产生 186 条记录。当 运行 使用 retmax
的不同值进行此搜索时:
- 设置
retmax_set
<= 61 作品
- 设置
retmax_set
>= 62 给出上述错误
搜索 term_set = "transcription AND enhancer AND promoter AND 2017[PDAT]"
产生 395 条记录(出于某种原因,PubMed 将 29 条记录标记为在 2017 年和 2018 年发布)。当 运行 使用不同的值 retmax
:
在这个搜索词上设置我的代码时
- 设置
retmax_set
<= 131 作品
- 设置
retmax_set
>= 132 给出上述错误
有趣的是,当 retmax
值大于记录总数的三分之一时(552 / 3 = 184, 186 / 3 = 62, 395 / 3 = 131.67 ).我将修改我的代码以根据 entrez_search
返回的结果数计算 retmax_set
,但我不知道为什么 rentrez
或 NCBI 这样做。有任何想法吗?
> ## set search term
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ## load package
> library(rentrez)
> ## set maximum records batch
> retmax_set = 182
> ## search pubmed using web history
> search <- entrez_search(
+ db = "pubmed",
+ term = term_set,
+ use_history = T
+ )
> ## get summaries of search hits
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+ summary1 <- entrez_summary(
+ db = "pubmed",
+ web_history = search$web_history,
+ retmax = retmax_set,
+ retstart = seq_start
+ )
+ summary <- c(summary, summary1)
+ }
> ## download full XML refs for hits
> XML_refs <- entrez_fetch(
+ db = "pubmed",
+ web_history = search$web_history,
+ rettype = "xml",
+ parsed = TRUE
+ )
>
>
> ## set search term
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ## load package
> library(rentrez)
> ## set maximum records batch
> retmax_set = 183
> ## search pubmed using web history
> search <- entrez_search(
+ db = "pubmed",
+ term = term_set,
+ use_history = T
+ )
> ## get summaries of search hits
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+ summary1 <- entrez_summary(
+ db = "pubmed",
+ web_history = search$web_history,
+ retmax = retmax_set,
+ retstart = seq_start
+ )
+ summary <- c(summary, summary1)
+ }
> ## download full XML refs for hits
> XML_refs <- entrez_fetch(
+ db = "pubmed",
+ web_history = search$web_history,
+ rettype = "xml",
+ parsed = TRUE
+ )
>
>
> ## set search term
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ## load package
> library(rentrez)
> ## set maximum records batch
> retmax_set = 184
> ## search pubmed using web history
> search <- entrez_search(
+ db = "pubmed",
+ term = term_set,
+ use_history = T
+ )
> ## get summaries of search hits
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+ summary1 <- entrez_summary(
+ db = "pubmed",
+ web_history = search$web_history,
+ retmax = retmax_set,
+ retstart = seq_start
+ )
+ summary <- c(summary, summary1)
+ }
> ## download full XML refs for hits
> XML_refs <- entrez_fetch(
+ db = "pubmed",
+ web_history = search$web_history,
+ rettype = "xml",
+ parsed = TRUE
+ )
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
<ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_51629226_130.14.18.34_9001_1531773486_1795859931_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
<ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC ---
</ERROR>
</eFetchResult>
>
>
> ## set search term
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ## load package
> library(rentrez)
> ## set maximum records batch
> retmax_set = 185
> ## search pubmed using web history
> search <- entrez_search(
+ db = "pubmed",
+ term = term_set,
+ use_history = T
+ )
> ## get summaries of search hits
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+ summary1 <- entrez_summary(
+ db = "pubmed",
+ web_history = search$web_history,
+ retmax = retmax_set,
+ retstart = seq_start
+ )
+ summary <- c(summary, summary1)
+ }
> ## download full XML refs for hits
> XML_refs <- entrez_fetch(
+ db = "pubmed",
+ web_history = search$web_history,
+ rettype = "xml",
+ parsed = TRUE
+ )
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
<ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_52654089_130.14.22.215_9001_1531773493_484860305_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
<ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC ---
</ERROR>
</eFetchResult>
原来rentrez使用的是0基计数。因此 552 条记录对应于 retstart
值 0 到 551。由于我的代码正在查找值 1 到 552,因此它错过了第一条记录 (#0),然后在查找不存在的记录时抛出错误#552.
我有一个奇怪的情况。我正在使用 rentrez
挖掘 PubMed 数据。当我 运行 entrez_search()
然后 entrez_summary()
然后 entrez_fetch()
我收到此错误消息(post 底部的完整代码):
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
<ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_51629226_130.14.18.34_9001_1531773486_1795859931_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
<ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC ---
</ERROR>
</eFetchResult>
四处搜索后,我认为我已经在 this discussion 的查询大小中找到了解决方案。当我将 retmax_set
从 500 减少到 10 时,代码起作用了。然后,我反复确定不会引发错误的最大 retmax_set
值,并发现在我看来非常奇怪的行为。
搜索 term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
产生 552 条记录。当 运行 使用不同的 retmax
:
- 设置
retmax_set
<= 183 作品 - 设置
retmax_set
>= 184 给出上述错误
修改后的搜索 term_set = "transcription AND enhancer AND promoter AND 2018[PDAT]"
产生 186 条记录。当 运行 使用 retmax
的不同值进行此搜索时:
- 设置
retmax_set
<= 61 作品 - 设置
retmax_set
>= 62 给出上述错误
搜索 term_set = "transcription AND enhancer AND promoter AND 2017[PDAT]"
产生 395 条记录(出于某种原因,PubMed 将 29 条记录标记为在 2017 年和 2018 年发布)。当 运行 使用不同的值 retmax
:
- 设置
retmax_set
<= 131 作品 - 设置
retmax_set
>= 132 给出上述错误
有趣的是,当 retmax
值大于记录总数的三分之一时(552 / 3 = 184, 186 / 3 = 62, 395 / 3 = 131.67 ).我将修改我的代码以根据 entrez_search
返回的结果数计算 retmax_set
,但我不知道为什么 rentrez
或 NCBI 这样做。有任何想法吗?
> ## set search term
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ## load package
> library(rentrez)
> ## set maximum records batch
> retmax_set = 182
> ## search pubmed using web history
> search <- entrez_search(
+ db = "pubmed",
+ term = term_set,
+ use_history = T
+ )
> ## get summaries of search hits
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+ summary1 <- entrez_summary(
+ db = "pubmed",
+ web_history = search$web_history,
+ retmax = retmax_set,
+ retstart = seq_start
+ )
+ summary <- c(summary, summary1)
+ }
> ## download full XML refs for hits
> XML_refs <- entrez_fetch(
+ db = "pubmed",
+ web_history = search$web_history,
+ rettype = "xml",
+ parsed = TRUE
+ )
>
>
> ## set search term
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ## load package
> library(rentrez)
> ## set maximum records batch
> retmax_set = 183
> ## search pubmed using web history
> search <- entrez_search(
+ db = "pubmed",
+ term = term_set,
+ use_history = T
+ )
> ## get summaries of search hits
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+ summary1 <- entrez_summary(
+ db = "pubmed",
+ web_history = search$web_history,
+ retmax = retmax_set,
+ retstart = seq_start
+ )
+ summary <- c(summary, summary1)
+ }
> ## download full XML refs for hits
> XML_refs <- entrez_fetch(
+ db = "pubmed",
+ web_history = search$web_history,
+ rettype = "xml",
+ parsed = TRUE
+ )
>
>
> ## set search term
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ## load package
> library(rentrez)
> ## set maximum records batch
> retmax_set = 184
> ## search pubmed using web history
> search <- entrez_search(
+ db = "pubmed",
+ term = term_set,
+ use_history = T
+ )
> ## get summaries of search hits
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+ summary1 <- entrez_summary(
+ db = "pubmed",
+ web_history = search$web_history,
+ retmax = retmax_set,
+ retstart = seq_start
+ )
+ summary <- c(summary, summary1)
+ }
> ## download full XML refs for hits
> XML_refs <- entrez_fetch(
+ db = "pubmed",
+ web_history = search$web_history,
+ rettype = "xml",
+ parsed = TRUE
+ )
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
<ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_51629226_130.14.18.34_9001_1531773486_1795859931_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
<ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC ---
</ERROR>
</eFetchResult>
>
>
> ## set search term
> term_set = "transcription AND enhancer AND promoter AND 2017:2018[PDAT]"
> ## load package
> library(rentrez)
> ## set maximum records batch
> retmax_set = 185
> ## search pubmed using web history
> search <- entrez_search(
+ db = "pubmed",
+ term = term_set,
+ use_history = T
+ )
> ## get summaries of search hits
> summary <- list(); for (seq_start in seq(0, search$count - 1, retmax_set)) {
+ summary1 <- entrez_summary(
+ db = "pubmed",
+ web_history = search$web_history,
+ retmax = retmax_set,
+ retstart = seq_start
+ )
+ summary <- c(summary, summary1)
+ }
> ## download full XML refs for hits
> XML_refs <- entrez_fetch(
+ db = "pubmed",
+ web_history = search$web_history,
+ rettype = "xml",
+ parsed = TRUE
+ )
Error: HTTP failure: 400
<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE eEfetchResult PUBLIC "-//NLM//DTD efetch 20131226//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20131226/efetch.dtd">
<eFetchResult>
<ERROR>Cannot retrieve history data. query_key: 1, WebEnv: NCID_1_52654089_130.14.22.215_9001_1531773493_484860305_0MetA0_S_MegaStore, retstart: 0, retmax: 552</ERROR>
<ERROR>Can't fetch uids from history because of: NCBI C++ Exception:
Error: UNK_MODULE(CException::eInvalid) "UNK_FILE", line 18446744073709551615: UNK_FUNC ---
</ERROR>
</eFetchResult>
原来rentrez使用的是0基计数。因此 552 条记录对应于 retstart
值 0 到 551。由于我的代码正在查找值 1 到 552,因此它错过了第一条记录 (#0),然后在查找不存在的记录时抛出错误#552.