有没有办法在 R 中的 ENSEMBL 上访问感兴趣基因的特定编码序列?
Is there a way to access a specific coding sequence of a gene of interest on ENSEMBL in R?
我正在尝试找到一种方法来检索特定感兴趣基因的编码序列 (CDS) 并将其加载到 R 中。我尝试了 BioMart 包的运气,但它没有让我指定哪个我感兴趣的基因。
感谢任何帮助!
最好的,
黑子
这应该有效:
library(biomaRt)
library(Biostrings)
mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
cds_seq = getSequence(id = "APOE",
type = "hgnc_symbol",
seqType = "cdna",
mart = mart)
我们可以翻译 CDS:
AAs = sapply(cds_seq$coding,function(i)if(i=="Sequence unavailable"){NA}else{translate(DNAString(i))})
获取肽段序列:
pep_seq = getSequence(id = "APOE",
type = "hgnc_symbol",
seqType = "peptide",
mart = mart)
并检查它们是否相似:
lapply(which(pep_seq$peptide!="Sequence unavailable"),function(i){
pep_seq$peptide[i] == as.character(AAs[[i]])
})
[[1]]
[1] TRUE
[[2]]
[1] TRUE
[[3]]
[1] TRUE
[[4]]
[1] TRUE
如果你想获取 refseq,请执行:
cds_seq = getSequence(id = "NM_000041",
type = "refseq_mrna",
seqType = "coding",
mart = mart)
我正在尝试找到一种方法来检索特定感兴趣基因的编码序列 (CDS) 并将其加载到 R 中。我尝试了 BioMart 包的运气,但它没有让我指定哪个我感兴趣的基因。
感谢任何帮助!
最好的, 黑子
这应该有效:
library(biomaRt)
library(Biostrings)
mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
cds_seq = getSequence(id = "APOE",
type = "hgnc_symbol",
seqType = "cdna",
mart = mart)
我们可以翻译 CDS:
AAs = sapply(cds_seq$coding,function(i)if(i=="Sequence unavailable"){NA}else{translate(DNAString(i))})
获取肽段序列:
pep_seq = getSequence(id = "APOE",
type = "hgnc_symbol",
seqType = "peptide",
mart = mart)
并检查它们是否相似:
lapply(which(pep_seq$peptide!="Sequence unavailable"),function(i){
pep_seq$peptide[i] == as.character(AAs[[i]])
})
[[1]]
[1] TRUE
[[2]]
[1] TRUE
[[3]]
[1] TRUE
[[4]]
[1] TRUE
如果你想获取 refseq,请执行:
cds_seq = getSequence(id = "NM_000041",
type = "refseq_mrna",
seqType = "coding",
mart = mart)