有谁知道如何从 R 中的 KEGG 检索细胞周期基因列表?
Does anyone know how to retrieve list of cell cycle genes from KEGG in R?
我知道如何使用 KEGG API 从 KEGG 网站检索特定通路的基因列表,但我找不到任何可以在 R 中做同样事情的包。
我找到的唯一注释包是 KEGG.db,它只给出了 KEGG 中可用路径的列表。
http://www.kegg.jp/kegg/docs/keggapi.html
通过输入通路 ID 并像这样在 KEGG 上搜索细胞周期基因:
http://rest.kegg.jp/get/hsa04110
有人知道 R/solution 中可以帮助我解决问题的软件包吗?
提前致谢,
重新阅读你的问题后,我相信这是可以帮助你的R包。它在 bioconductor 上,允许您通过 R 和 REST 与 KEGG 进行交互。
KEGGREST: Client-side REST access to KEGG
A package that provides a client interface to the KEGG REST server. Based on KEGGSOAP by J. Zhang, R. Gentleman, and Marc Carlson, and KEGG (python package) by Aurelien Mazurie.
我最近找到了两种获取KEGG通路及其基因的方法(其中一种使用了之前提出的包KEGGREST
)。
第一种方式
library(limma)
library(AnnotationDbi)
library(org.Hs.eg.db)
# We get entrez ids and their pathways.
gene_pathways <- getGeneKEGGLinks(species="hsa")
# This is to get the gene symbols using entrez ids
gene_pathways$Symbol <- mapIds(org.Hs.eg.db, gene_pathways$GeneID,
column="SYMBOL", keytype="ENTREZID")
# pathway names
pathway_names <- getKEGGPathwayNames(species="hsa")
KEGG_pathways <- merge(gene_pathways, pathway_names, by="PathwayID")
输出:
head(KEGG_pathways)
PathwayID GeneID Symbol Description
1 path:hsa00010 10327 AKR1A1 Glycolysis / Gluconeogenesis - Homo sapiens (human)
2 path:hsa00010 124 ADH1A Glycolysis / Gluconeogenesis - Homo sapiens (human)
3 path:hsa00010 125 ADH1B Glycolysis / Gluconeogenesis - Homo sapiens (human)
4 path:hsa00010 126 ADH1C Glycolysis / Gluconeogenesis - Homo sapiens (human)
5 path:hsa00010 127 ADH4 Glycolysis / Gluconeogenesis - Homo sapiens (human)
第二种方式
library(KEGGREST)
library(org.Hs.eg.db)
library(tidyverse)
# get pathways and their entrez gene ids
hsa_path_entrez <- keggLink("pathway", "hsa") %>%
tibble(pathway = ., eg = sub("hsa:", "", names(.)))
# get gene symbols and ensembl ids using entrez gene ids
hsa_kegg_anno <- hsa_path_entrez %>%
mutate(
symbol = mapIds(org.Hs.eg.db, eg, "SYMBOL", "ENTREZID"),
ensembl = mapIds(org.Hs.eg.db, eg, "ENSEMBL", "ENTREZID")
)
# Pathway names
hsa_pathways <- keggList("pathway", "hsa") %>%
tibble(pathway = names(.), description = .)
KEGG_pathways <- left_join(hsa_kegg_anno, hsa_pathways)
输出:
head(KEGG_pathways)
A tibble: 6 x 5
pathway eg symbol ensembl description
<chr> <chr> <chr> <chr> <chr>
1 path:hsa00010 10327 AKR1A1 ENSG00000117448 Glycolysis / Gluconeogenesis - Homo sapiens (human)
2 path:hsa00010 124 ADH1A ENSG00000187758 Glycolysis / Gluconeogenesis - Homo sapiens (human)
3 path:hsa00010 125 ADH1B ENSG00000196616 Glycolysis / Gluconeogenesis - Homo sapiens (human)
4 path:hsa00010 126 ADH1C ENSG00000248144 Glycolysis / Gluconeogenesis - Homo sapiens (human)
5 path:hsa00010 127 ADH4 ENSG00000198099 Glycolysis / Gluconeogenesis - Homo sapiens (human)
如果出于某种原因您需要查询其他物种,您只需替换“hsa”。使用这行代码 keggList("organism")
您可以获得可用物种的列表。
org <- keggList("organism")
head(org)
T.number organism species phylogeny
[1,] "T01001" "hsa" "Homo sapiens (human)" "Eukaryotes;Animals;Vertebrates;Mammals"
[2,] "T01005" "ptr" "Pan troglodytes (chimpanzee)" "Eukaryotes;Animals;Vertebrates;Mammals"
[3,] "T02283" "pps" "Pan paniscus (bonobo)" "Eukaryotes;Animals;Vertebrates;Mammals"
[4,] "T02442" "ggo" "Gorilla gorilla gorilla (western lowland gorilla)" "Eukaryotes;Animals;Vertebrates;Mammals"
[5,] "T01416" "pon" "Pongo abelii (Sumatran orangutan)" "Eukaryotes;Animals;Vertebrates;Mammals"
注意:
虽然我是用org.Hs.eg.db
来获取基因符号,但也可以从biomaRt
.
获取
library(biomaRt)
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
attributes <- listAttributes(mart)
genes <- getBM(attributes = c("hgnc_symbol", "entrezgene_id"),
mart = mart)
有关 KEGGREST
的其他有用信息可以在 vignette.
中找到
我知道如何使用 KEGG API 从 KEGG 网站检索特定通路的基因列表,但我找不到任何可以在 R 中做同样事情的包。 我找到的唯一注释包是 KEGG.db,它只给出了 KEGG 中可用路径的列表。
http://www.kegg.jp/kegg/docs/keggapi.html
通过输入通路 ID 并像这样在 KEGG 上搜索细胞周期基因:
http://rest.kegg.jp/get/hsa04110
有人知道 R/solution 中可以帮助我解决问题的软件包吗?
提前致谢,
重新阅读你的问题后,我相信这是可以帮助你的R包。它在 bioconductor 上,允许您通过 R 和 REST 与 KEGG 进行交互。
KEGGREST: Client-side REST access to KEGG
A package that provides a client interface to the KEGG REST server. Based on KEGGSOAP by J. Zhang, R. Gentleman, and Marc Carlson, and KEGG (python package) by Aurelien Mazurie.
我最近找到了两种获取KEGG通路及其基因的方法(其中一种使用了之前提出的包KEGGREST
)。
第一种方式
library(limma)
library(AnnotationDbi)
library(org.Hs.eg.db)
# We get entrez ids and their pathways.
gene_pathways <- getGeneKEGGLinks(species="hsa")
# This is to get the gene symbols using entrez ids
gene_pathways$Symbol <- mapIds(org.Hs.eg.db, gene_pathways$GeneID,
column="SYMBOL", keytype="ENTREZID")
# pathway names
pathway_names <- getKEGGPathwayNames(species="hsa")
KEGG_pathways <- merge(gene_pathways, pathway_names, by="PathwayID")
输出:
head(KEGG_pathways)
PathwayID GeneID Symbol Description
1 path:hsa00010 10327 AKR1A1 Glycolysis / Gluconeogenesis - Homo sapiens (human)
2 path:hsa00010 124 ADH1A Glycolysis / Gluconeogenesis - Homo sapiens (human)
3 path:hsa00010 125 ADH1B Glycolysis / Gluconeogenesis - Homo sapiens (human)
4 path:hsa00010 126 ADH1C Glycolysis / Gluconeogenesis - Homo sapiens (human)
5 path:hsa00010 127 ADH4 Glycolysis / Gluconeogenesis - Homo sapiens (human)
第二种方式
library(KEGGREST)
library(org.Hs.eg.db)
library(tidyverse)
# get pathways and their entrez gene ids
hsa_path_entrez <- keggLink("pathway", "hsa") %>%
tibble(pathway = ., eg = sub("hsa:", "", names(.)))
# get gene symbols and ensembl ids using entrez gene ids
hsa_kegg_anno <- hsa_path_entrez %>%
mutate(
symbol = mapIds(org.Hs.eg.db, eg, "SYMBOL", "ENTREZID"),
ensembl = mapIds(org.Hs.eg.db, eg, "ENSEMBL", "ENTREZID")
)
# Pathway names
hsa_pathways <- keggList("pathway", "hsa") %>%
tibble(pathway = names(.), description = .)
KEGG_pathways <- left_join(hsa_kegg_anno, hsa_pathways)
输出:
head(KEGG_pathways)
A tibble: 6 x 5
pathway eg symbol ensembl description
<chr> <chr> <chr> <chr> <chr>
1 path:hsa00010 10327 AKR1A1 ENSG00000117448 Glycolysis / Gluconeogenesis - Homo sapiens (human)
2 path:hsa00010 124 ADH1A ENSG00000187758 Glycolysis / Gluconeogenesis - Homo sapiens (human)
3 path:hsa00010 125 ADH1B ENSG00000196616 Glycolysis / Gluconeogenesis - Homo sapiens (human)
4 path:hsa00010 126 ADH1C ENSG00000248144 Glycolysis / Gluconeogenesis - Homo sapiens (human)
5 path:hsa00010 127 ADH4 ENSG00000198099 Glycolysis / Gluconeogenesis - Homo sapiens (human)
如果出于某种原因您需要查询其他物种,您只需替换“hsa”。使用这行代码 keggList("organism")
您可以获得可用物种的列表。
org <- keggList("organism")
head(org)
T.number organism species phylogeny
[1,] "T01001" "hsa" "Homo sapiens (human)" "Eukaryotes;Animals;Vertebrates;Mammals"
[2,] "T01005" "ptr" "Pan troglodytes (chimpanzee)" "Eukaryotes;Animals;Vertebrates;Mammals"
[3,] "T02283" "pps" "Pan paniscus (bonobo)" "Eukaryotes;Animals;Vertebrates;Mammals"
[4,] "T02442" "ggo" "Gorilla gorilla gorilla (western lowland gorilla)" "Eukaryotes;Animals;Vertebrates;Mammals"
[5,] "T01416" "pon" "Pongo abelii (Sumatran orangutan)" "Eukaryotes;Animals;Vertebrates;Mammals"
注意:
虽然我是用org.Hs.eg.db
来获取基因符号,但也可以从biomaRt
.
library(biomaRt)
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
attributes <- listAttributes(mart)
genes <- getBM(attributes = c("hgnc_symbol", "entrezgene_id"),
mart = mart)
有关 KEGGREST
的其他有用信息可以在 vignette.