R中的简单密码子到氨基酸哈希

Simple codon-to-amino acid hash in R

我想创建一个 R 脚本,其中有一个散列 table,我可以通过它查找密码子并获取其相关氨基酸。例如,

library(hash)

hashTable <- hash(...) #insert all codon-to-amino acid pairs
hashTable['TTT']

将return

[1] Phe

有谁知道我该怎么做?或者也许是一个包(Bioconductor?)我可以安装它会让这更容易吗?

因为为什么要使用哈希表?

acidLookup<-function(x){
  acids<-c("Isoleucine","Leucine","Valine","Phenylalanine","Methionine","Cysteine","Alanine","Glycine","Proline","Threonine","Serine",
         "Tyrosine","Tryptophan","Glutamine","Asparagine","Histidine","Glutamic acid","Aspartic acid","Lysine","Arginine","Stop codons")
  slc<-c("I","L","V","F","M","C","A","G","P","T","S","Y","W","Q","N","H","E","D","K","R","Stop")
  codon<-c("ATT, ATC, ATA","CTT, CTC, CTA, CTG, TTA, TTG","GTT, GTC, GTA, GTG","TTT, TTC","ATG","TGT, TGC",
         "GCT, GCC, GCA, GCG","GGT, GGC, GGA, GGG","CCT, CCC, CCA, CCG","ACT, ACC, ACA, ACG","TCT, TCC, TCA, TCG, AGT, AGC",
         "TAT, TAC","TGG","CAA, CAG","AAT, AAC","CAT, CAC","GAA, GAG","GAT, GAC","AAA, AAG","CGT, CGC, CGA, CGG, AGA, AGG","TAA, TAG, TGA")

  codon.list<-strsplit(codon,",")

  data.frame(acid=acids[grep(x,codon.list)],slc=slc[grep(x,codon.list)],codons=codon[grep(x,codon.list)])
}

acidLookup("ATA")

        acid slc        codons
1 Isoleucine   I ATT, ATC, ATA

这个问题几乎肯定有一个预先存在的解决方案。一种可能性是 Bioconductor 的 Biostrings例如

library(Biostrings)
GENETIC_CODE[["ATG"]]
[1] "M"

无需使用特定的哈希 table 实现。如果 Biostrings 还不够,names 中用于基础 R 的 vectors/lists 标准符号应该有效:

aaCodes <- character(0);
aaCodes["ATG"] <- "Ile";
aaCodes["UGA"] <- "Trp";
aaCodes[c("CTC","AGG")] <- c("Leu","Ser");

> names(aaCodes)
[1] "ATG" "UGA" "CTC" "AGG"

> aaCodes[c("ATG","ATG","CTC","UGA")]
  ATG   ATG   CTC   UGA
"Ile" "Ile" "Leu" "Trp"

> substring("ATGATGCTCUGA",0:3*3+1,0:3*3+3)
[1] "ATG" "ATG" "CTC" "UGA"

> aaCodes[substring("ATGATGCTCUGA",0:3*3+1,0:3*3+3)]
  ATG   ATG   CTC   UGA 
"Ile" "Ile" "Leu" "Trp" 

这不会显示用于每个字符串的 R 的内部哈希值,但看起来这个问题并不要求这样做。