有谁知道如何从 R 中的 KEGG 检索细胞周期基因列表?

Does anyone know how to retrieve list of cell cycle genes from KEGG in R?

我知道如何使用 KEGG API 从 KEGG 网站检索特定通路的基因列表,但我找不到任何可以在 R 中做同样事情的包。 我找到的唯一注释包是 KEGG.db,它只给出了 KEGG 中可用路径的列表。

http://www.kegg.jp/kegg/docs/keggapi.html

通过输入通路 ID 并像这样在 KEGG 上搜索细胞周期基因:

http://rest.kegg.jp/get/hsa04110

有人知道 R/solution 中可以帮助我解决问题的软件包吗?

提前致谢,

重新阅读你的问题后,我相信这是可以帮助你的R包。它在 bioconductor 上,允许您通过 R 和 REST 与 KEGG 进行交互。

KEGGREST

KEGGREST: Client-side REST access to KEGG

A package that provides a client interface to the KEGG REST server. Based on KEGGSOAP by J. Zhang, R. Gentleman, and Marc Carlson, and KEGG (python package) by Aurelien Mazurie.

我最近找到了两种获取KEGG通路及其基因的方法(其中一种使用了之前提出的包KEGGREST)。

第一种方式

library(limma)
library(AnnotationDbi)
library(org.Hs.eg.db)

# We get entrez ids and their pathways.
gene_pathways <- getGeneKEGGLinks(species="hsa")

# This is to get the gene symbols using entrez ids
gene_pathways$Symbol <- mapIds(org.Hs.eg.db, gene_pathways$GeneID,
                       column="SYMBOL", keytype="ENTREZID")

# pathway names
pathway_names <- getKEGGPathwayNames(species="hsa")


KEGG_pathways <- merge(gene_pathways, pathway_names, by="PathwayID")

输出:

head(KEGG_pathways)

PathwayID GeneID Symbol Description

1 path:hsa00010  10327 AKR1A1 Glycolysis / Gluconeogenesis - Homo sapiens (human)

2 path:hsa00010    124  ADH1A Glycolysis / Gluconeogenesis - Homo sapiens (human)

3 path:hsa00010    125  ADH1B Glycolysis / Gluconeogenesis - Homo sapiens (human)

4 path:hsa00010    126  ADH1C Glycolysis / Gluconeogenesis - Homo sapiens (human)

5 path:hsa00010    127   ADH4 Glycolysis / Gluconeogenesis - Homo sapiens (human)

第二种方式

library(KEGGREST)
library(org.Hs.eg.db)
library(tidyverse)

# get pathways and their entrez gene ids

hsa_path_entrez  <- keggLink("pathway", "hsa") %>% 
  tibble(pathway = ., eg = sub("hsa:", "", names(.)))

# get gene symbols and ensembl ids using entrez gene ids

hsa_kegg_anno <- hsa_path_entrez %>%
  mutate(
    symbol = mapIds(org.Hs.eg.db, eg, "SYMBOL", "ENTREZID"),
    ensembl = mapIds(org.Hs.eg.db, eg, "ENSEMBL", "ENTREZID")
  )

# Pathway names
hsa_pathways <- keggList("pathway", "hsa") %>% 
  tibble(pathway = names(.), description = .)

KEGG_pathways <- left_join(hsa_kegg_anno, hsa_pathways)

输出:

head(KEGG_pathways)

A tibble: 6 x 5

pathway       eg    symbol ensembl         description                                        
<chr>         <chr> <chr>  <chr>           <chr>    
                                          
1 path:hsa00010 10327 AKR1A1 ENSG00000117448 Glycolysis / Gluconeogenesis - Homo sapiens (human)

2 path:hsa00010 124   ADH1A  ENSG00000187758 Glycolysis / Gluconeogenesis - Homo sapiens (human)

3 path:hsa00010 125   ADH1B  ENSG00000196616 Glycolysis / Gluconeogenesis - Homo sapiens (human)

4 path:hsa00010 126   ADH1C  ENSG00000248144 Glycolysis / Gluconeogenesis - Homo sapiens (human)

5 path:hsa00010 127   ADH4   ENSG00000198099 Glycolysis / Gluconeogenesis - Homo sapiens (human)

如果出于某种原因您需要查询其他物种,您只需替换“hsa”。使用这行代码 keggList("organism") 您可以获得可用物种的列表。

org <- keggList("organism")

head(org)

T.number organism species phylogeny                               
[1,] "T01001" "hsa"    "Homo sapiens (human)"                                "Eukaryotes;Animals;Vertebrates;Mammals"

[2,] "T01005" "ptr"    "Pan troglodytes (chimpanzee)"                        "Eukaryotes;Animals;Vertebrates;Mammals"

[3,] "T02283" "pps"    "Pan paniscus (bonobo)"                               "Eukaryotes;Animals;Vertebrates;Mammals"

[4,] "T02442" "ggo"    "Gorilla gorilla gorilla (western lowland gorilla)"   "Eukaryotes;Animals;Vertebrates;Mammals"

[5,] "T01416" "pon"    "Pongo abelii (Sumatran orangutan)"                   "Eukaryotes;Animals;Vertebrates;Mammals"

注意: 虽然我是用org.Hs.eg.db来获取基因符号,但也可以从biomaRt.

获取
library(biomaRt)
mart <- useDataset("hsapiens_gene_ensembl", useMart("ensembl"))
attributes <- listAttributes(mart)
genes <- getBM(attributes = c("hgnc_symbol", "entrezgene_id"),
               mart = mart)

有关 KEGGREST 的其他有用信息可以在 vignette.

中找到