如何使用 rentrez 跟踪哪个蛋白质 ID 与哪个基因 ID 相关联
How to track which protein ID is linked to which gene ID with rentrez
我有一堆蛋白质 ID,我想在不丢失蛋白质 ID 的情况下获取相应的编码序列 (CDS)。我已经设法下载了相应的CDS,但不幸的是,CDS ID与NCBI中的蛋白质ID有很大不同。
我有以下 R 代码:
library(rentrez)
Prot_ids <- c("XP_012370245.1","XP_004866438.1","XP_013359583.1")
links <- entrez_link(dbfrom="protein", db="nuccore", id=Prot_ids, by_id = TRUE)
然后,我使用此命令对 "match" 具有 CDS ID 的蛋白质 ID 进行了处理:
lapply(links, function(x) x$links$protein_nuccore_mrna)
[[1]]
[1] "820968283"
[[2]]
[1] "861491027"
[[3]]
[1] "918634580"
但是,正如您所见,参数 'by_id=TRUE' 只是列出了三个 elink 对象,但现在我丢失了蛋白质 ID。
我想要这样的东西:
Protein ID XP_012370245.1 XP_004866438.1 XP_013359583.1
CDS ID XM_004866381.2 XM_012514791.1 XM_013504129.1
非常欢迎任何建议,谢谢!!
library(rentrez)
Prot_ids <- c("XP_012370245.1","XP_004866438.1","XP_013359583.1")
links <- entrez_link(dbfrom="protein", db="nuccore", id=Prot_ids, by_id = TRUE)
linkids <- sapply(links, function(x) x$links$protein_nuccore_mrna)
##Get the summary for the gi record
linkNuc <- entrez_summary(id = linkids, db = "nuccore")
df <- data.frame(ProtIDs = Prot_ids[rep(sapply(links, function(x) length(x$links$protein_nuccore_mrna)))],
linkids,
NucID=sapply(strsplit(sapply(linkNuc, "[[", "extra"), split = "\|"), "[", 4))
# ProtIDs linkids NucID
#820968283 XP_012370245.1 820968283 XM_012514791.1
#861491027 XP_012370245.1 861491027 XM_004866381.2
#918634580 XP_012370245.1 918634580 XM_013504129.1
我有一堆蛋白质 ID,我想在不丢失蛋白质 ID 的情况下获取相应的编码序列 (CDS)。我已经设法下载了相应的CDS,但不幸的是,CDS ID与NCBI中的蛋白质ID有很大不同。
我有以下 R 代码:
library(rentrez)
Prot_ids <- c("XP_012370245.1","XP_004866438.1","XP_013359583.1")
links <- entrez_link(dbfrom="protein", db="nuccore", id=Prot_ids, by_id = TRUE)
然后,我使用此命令对 "match" 具有 CDS ID 的蛋白质 ID 进行了处理:
lapply(links, function(x) x$links$protein_nuccore_mrna)
[[1]]
[1] "820968283"
[[2]]
[1] "861491027"
[[3]]
[1] "918634580"
但是,正如您所见,参数 'by_id=TRUE' 只是列出了三个 elink 对象,但现在我丢失了蛋白质 ID。
我想要这样的东西:
Protein ID XP_012370245.1 XP_004866438.1 XP_013359583.1
CDS ID XM_004866381.2 XM_012514791.1 XM_013504129.1
非常欢迎任何建议,谢谢!!
library(rentrez)
Prot_ids <- c("XP_012370245.1","XP_004866438.1","XP_013359583.1")
links <- entrez_link(dbfrom="protein", db="nuccore", id=Prot_ids, by_id = TRUE)
linkids <- sapply(links, function(x) x$links$protein_nuccore_mrna)
##Get the summary for the gi record
linkNuc <- entrez_summary(id = linkids, db = "nuccore")
df <- data.frame(ProtIDs = Prot_ids[rep(sapply(links, function(x) length(x$links$protein_nuccore_mrna)))],
linkids,
NucID=sapply(strsplit(sapply(linkNuc, "[[", "extra"), split = "\|"), "[", 4))
# ProtIDs linkids NucID
#820968283 XP_012370245.1 820968283 XM_012514791.1
#861491027 XP_012370245.1 861491027 XM_004866381.2
#918634580 XP_012370245.1 918634580 XM_013504129.1