有没有办法在 R 中的 ENSEMBL 上访问感兴趣基因的特定编码序列?

Is there a way to access a specific coding sequence of a gene of interest on ENSEMBL in R?

我正在尝试找到一种方法来检索特定感兴趣基因的编码序列 (CDS) 并将其加载到 R 中。我尝试了 BioMart 包的运气,但它没有让我指定哪个我感兴趣的基因。

感谢任何帮助!

最好的, 黑子

这应该有效:

library(biomaRt)
library(Biostrings)
mart <- useMart("ensembl", dataset="hsapiens_gene_ensembl")
cds_seq = getSequence(id = "APOE", 
                   type = "hgnc_symbol", 
                   seqType = "cdna", 
                   mart = mart)

我们可以翻译 CDS:

AAs = sapply(cds_seq$coding,function(i)if(i=="Sequence unavailable"){NA}else{translate(DNAString(i))})

获取肽段序列:

pep_seq = getSequence(id = "APOE", 
                   type = "hgnc_symbol", 
                   seqType = "peptide", 
                   mart = mart)

并检查它们是否相似:

lapply(which(pep_seq$peptide!="Sequence unavailable"),function(i){
pep_seq$peptide[i] == as.character(AAs[[i]])
})

[[1]]
[1] TRUE

[[2]]
[1] TRUE

[[3]]
[1] TRUE

[[4]]
[1] TRUE

如果你想获取 refseq,请执行:

cds_seq = getSequence(id = "NM_000041", 
                      type = "refseq_mrna", 
                      seqType = "coding", 
                      mart = mart)