SNP坐标到基因名称
SNP coordinates to gene names
我在 UCSC 提供的床文件中有 SNP id 和坐标。我想将它们映射到它们的基因名称。
chr1 9160974 9160975 rs1013578619 0 +
chr1 164528869 164528870 rs1016074293 0 +
chr1 192216772 192216773 rs1018731047 0 +
chr1 117157669 117157670 rs1022293363 0 +
chr1 33148118 33148119 rs1022386792 0 +
我参考了很多建议使用 bedtools intersect、UCSC table 浏览器等的帖子,但我无法获得成功的结果。请建议用于此特定数据的选项。
我们可以使用biomaRt package:
# data
mySNPs <- read.table(text = "chr1 9160974 9160975 rs1013578619 0 +
chr1 164528869 164528870 rs1016074293 0 +
chr1 192216772 192216773 rs1018731047 0 +
chr1 117157669 117157670 rs1022293363 0 +
chr1 33148118 33148119 rs1022386792 0 +")
colnames(mySNPs) <- c("chr", "start", "end", "name", "x", "strand")
library(biomaRt)
snpmart = useMart(biomart = "ENSEMBL_MART_SNP", dataset = "hsapiens_snp")
# Check which filters and attributes we wan't to use:
# listAttributes(snpmart)
# listFilters(snpmart)
# result
getBM(attributes = c("refsnp_id", "chr_name", "chrom_start", "chrom_end", "ensembl_gene_stable_id"),
filters = c("snp_filter"),
values = mySNPs$name,
mart = snpmart)
# refsnp_id chr_name chrom_start chrom_end ensembl_gene_stable_id
# 1 rs1013578619 1 9160975 9160975 ENSG00000228526
# 2 rs1016074293 1 164528870 164528870
# 3 rs1018731047 1 192216773 192216773 ENSG00000285280
# 4 rs1022293363 1 117157670 117157670 ENSG00000134258
# 5 rs1022386792 1 33148119 33148119 ENSG00000278997
# 6 rs1022386792 1 33148119 33148119 ENSG00000116525