使用 biomaRt 将 Ensembl ID 转换为基因名称
convert Ensembl ID to gene name using biomaRt
我有一个名为 kidney_ensembl
的数据集,我需要将 Ensembl ID 转换为基因名称。
我正在尝试下面的代码,但它不起作用。有人可以帮我吗?
我知道有类似的问题,但他们对我没有帮助。非常感谢!
library(tidyverse)
kidney <- data.frame(gene_id = c("ENSG00000000003.10","ENSG00000000005.5",
"ENSG00000000419.8","ENSG00000000457.9","ENSG00000000460.12")
)
#kidney <- read_delim("Desktop/kidney_ensembl.txt", delim = "\t")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
library("biomaRt")
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <- kidney$gene_id
gene_IDs <- getBM(filters= "ensembl_gene_id", attributes= c("ensembl_gene_id","hgnc_symbol"),
values = genes, mart= mart)
kidney_final <- left_join(kidney, gene_IDs, by = NULL)
biomart 部分有效,您的左连接失败了,因为没有公共列,gene_IDs 的集成 ID 在 "ensembl_gene_id" 下,而您的肾脏数据框在 "gene_id" 下.
你还需要检查它们是gencode还是ensembl。 Gencode id 通常有一个 .[number] 例如, ENSG00000000003.10 ,在 ensembl 数据库中它是 ENSG00000000003.
library("biomaRt")
library("dplyr")
kidney <- data.frame(gene_id =
c("ENSG00000000003.10","ENSG00000000005.5",
"ENSG00000000419.8","ENSG00000000457.9","ENSG00000000460.12"),
vals=runif(5)
)
#make this a character, otherwise it will throw errors with left_join
kidney$gene_id <- as.character(kidney$gene_id)
# in case it's gencode, this mostly works
#if ensembl, will leave it alone
kidney$gene_id <- sub("[.][0-9]*","",kidney$gene_id)
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <- kidney$gene_id
gene_IDs <- getBM(filters= "ensembl_gene_id", attributes= c("ensembl_gene_id","hgnc_symbol"),
values = genes, mart= mart)
left_join(kidney, gene_IDs, by = c("gene_id"="ensembl_gene_id"))
gene_id vals hgnc_symbol
1 ENSG00000000003 0.2298255 TSPAN6
2 ENSG00000000005 0.4662570 TNMD
3 ENSG00000000419 0.7279107 DPM1
4 ENSG00000000457 0.3240166 SCYL3
5 ENSG00000000460 0.3038986 C1orf112
我有一个名为 kidney_ensembl
的数据集,我需要将 Ensembl ID 转换为基因名称。
我正在尝试下面的代码,但它不起作用。有人可以帮我吗?
我知道有类似的问题,但他们对我没有帮助。非常感谢!
library(tidyverse)
kidney <- data.frame(gene_id = c("ENSG00000000003.10","ENSG00000000005.5",
"ENSG00000000419.8","ENSG00000000457.9","ENSG00000000460.12")
)
#kidney <- read_delim("Desktop/kidney_ensembl.txt", delim = "\t")
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
library("biomaRt")
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <- kidney$gene_id
gene_IDs <- getBM(filters= "ensembl_gene_id", attributes= c("ensembl_gene_id","hgnc_symbol"),
values = genes, mart= mart)
kidney_final <- left_join(kidney, gene_IDs, by = NULL)
biomart 部分有效,您的左连接失败了,因为没有公共列,gene_IDs 的集成 ID 在 "ensembl_gene_id" 下,而您的肾脏数据框在 "gene_id" 下.
你还需要检查它们是gencode还是ensembl。 Gencode id 通常有一个 .[number] 例如, ENSG00000000003.10 ,在 ensembl 数据库中它是 ENSG00000000003.
library("biomaRt")
library("dplyr")
kidney <- data.frame(gene_id =
c("ENSG00000000003.10","ENSG00000000005.5",
"ENSG00000000419.8","ENSG00000000457.9","ENSG00000000460.12"),
vals=runif(5)
)
#make this a character, otherwise it will throw errors with left_join
kidney$gene_id <- as.character(kidney$gene_id)
# in case it's gencode, this mostly works
#if ensembl, will leave it alone
kidney$gene_id <- sub("[.][0-9]*","",kidney$gene_id)
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
genes <- kidney$gene_id
gene_IDs <- getBM(filters= "ensembl_gene_id", attributes= c("ensembl_gene_id","hgnc_symbol"),
values = genes, mart= mart)
left_join(kidney, gene_IDs, by = c("gene_id"="ensembl_gene_id"))
gene_id vals hgnc_symbol
1 ENSG00000000003 0.2298255 TSPAN6
2 ENSG00000000005 0.4662570 TNMD
3 ENSG00000000419 0.7279107 DPM1
4 ENSG00000000457 0.3240166 SCYL3
5 ENSG00000000460 0.3038986 C1orf112