R中的简单密码子到氨基酸哈希
Simple codon-to-amino acid hash in R
我想创建一个 R 脚本,其中有一个散列 table,我可以通过它查找密码子并获取其相关氨基酸。例如,
library(hash)
hashTable <- hash(...) #insert all codon-to-amino acid pairs
hashTable['TTT']
将return
[1] Phe
有谁知道我该怎么做?或者也许是一个包(Bioconductor?)我可以安装它会让这更容易吗?
因为为什么要使用哈希表?
acidLookup<-function(x){
acids<-c("Isoleucine","Leucine","Valine","Phenylalanine","Methionine","Cysteine","Alanine","Glycine","Proline","Threonine","Serine",
"Tyrosine","Tryptophan","Glutamine","Asparagine","Histidine","Glutamic acid","Aspartic acid","Lysine","Arginine","Stop codons")
slc<-c("I","L","V","F","M","C","A","G","P","T","S","Y","W","Q","N","H","E","D","K","R","Stop")
codon<-c("ATT, ATC, ATA","CTT, CTC, CTA, CTG, TTA, TTG","GTT, GTC, GTA, GTG","TTT, TTC","ATG","TGT, TGC",
"GCT, GCC, GCA, GCG","GGT, GGC, GGA, GGG","CCT, CCC, CCA, CCG","ACT, ACC, ACA, ACG","TCT, TCC, TCA, TCG, AGT, AGC",
"TAT, TAC","TGG","CAA, CAG","AAT, AAC","CAT, CAC","GAA, GAG","GAT, GAC","AAA, AAG","CGT, CGC, CGA, CGG, AGA, AGG","TAA, TAG, TGA")
codon.list<-strsplit(codon,",")
data.frame(acid=acids[grep(x,codon.list)],slc=slc[grep(x,codon.list)],codons=codon[grep(x,codon.list)])
}
acidLookup("ATA")
acid slc codons
1 Isoleucine I ATT, ATC, ATA
这个问题几乎肯定有一个预先存在的解决方案。一种可能性是 Bioconductor 的 Biostrings,例如:
library(Biostrings)
GENETIC_CODE[["ATG"]]
[1] "M"
无需使用特定的哈希 table 实现。如果 Biostrings 还不够,names 中用于基础 R 的 vectors/lists 标准符号应该有效:
aaCodes <- character(0);
aaCodes["ATG"] <- "Ile";
aaCodes["UGA"] <- "Trp";
aaCodes[c("CTC","AGG")] <- c("Leu","Ser");
> names(aaCodes)
[1] "ATG" "UGA" "CTC" "AGG"
> aaCodes[c("ATG","ATG","CTC","UGA")]
ATG ATG CTC UGA
"Ile" "Ile" "Leu" "Trp"
> substring("ATGATGCTCUGA",0:3*3+1,0:3*3+3)
[1] "ATG" "ATG" "CTC" "UGA"
> aaCodes[substring("ATGATGCTCUGA",0:3*3+1,0:3*3+3)]
ATG ATG CTC UGA
"Ile" "Ile" "Leu" "Trp"
这不会显示用于每个字符串的 R 的内部哈希值,但看起来这个问题并不要求这样做。
我想创建一个 R 脚本,其中有一个散列 table,我可以通过它查找密码子并获取其相关氨基酸。例如,
library(hash)
hashTable <- hash(...) #insert all codon-to-amino acid pairs
hashTable['TTT']
将return
[1] Phe
有谁知道我该怎么做?或者也许是一个包(Bioconductor?)我可以安装它会让这更容易吗?
因为为什么要使用哈希表?
acidLookup<-function(x){
acids<-c("Isoleucine","Leucine","Valine","Phenylalanine","Methionine","Cysteine","Alanine","Glycine","Proline","Threonine","Serine",
"Tyrosine","Tryptophan","Glutamine","Asparagine","Histidine","Glutamic acid","Aspartic acid","Lysine","Arginine","Stop codons")
slc<-c("I","L","V","F","M","C","A","G","P","T","S","Y","W","Q","N","H","E","D","K","R","Stop")
codon<-c("ATT, ATC, ATA","CTT, CTC, CTA, CTG, TTA, TTG","GTT, GTC, GTA, GTG","TTT, TTC","ATG","TGT, TGC",
"GCT, GCC, GCA, GCG","GGT, GGC, GGA, GGG","CCT, CCC, CCA, CCG","ACT, ACC, ACA, ACG","TCT, TCC, TCA, TCG, AGT, AGC",
"TAT, TAC","TGG","CAA, CAG","AAT, AAC","CAT, CAC","GAA, GAG","GAT, GAC","AAA, AAG","CGT, CGC, CGA, CGG, AGA, AGG","TAA, TAG, TGA")
codon.list<-strsplit(codon,",")
data.frame(acid=acids[grep(x,codon.list)],slc=slc[grep(x,codon.list)],codons=codon[grep(x,codon.list)])
}
acidLookup("ATA")
acid slc codons
1 Isoleucine I ATT, ATC, ATA
这个问题几乎肯定有一个预先存在的解决方案。一种可能性是 Bioconductor 的 Biostrings,例如:
library(Biostrings)
GENETIC_CODE[["ATG"]]
[1] "M"
无需使用特定的哈希 table 实现。如果 Biostrings 还不够,names 中用于基础 R 的 vectors/lists 标准符号应该有效:
aaCodes <- character(0);
aaCodes["ATG"] <- "Ile";
aaCodes["UGA"] <- "Trp";
aaCodes[c("CTC","AGG")] <- c("Leu","Ser");
> names(aaCodes)
[1] "ATG" "UGA" "CTC" "AGG"
> aaCodes[c("ATG","ATG","CTC","UGA")]
ATG ATG CTC UGA
"Ile" "Ile" "Leu" "Trp"
> substring("ATGATGCTCUGA",0:3*3+1,0:3*3+3)
[1] "ATG" "ATG" "CTC" "UGA"
> aaCodes[substring("ATGATGCTCUGA",0:3*3+1,0:3*3+3)]
ATG ATG CTC UGA
"Ile" "Ile" "Leu" "Trp"
这不会显示用于每个字符串的 R 的内部哈希值,但看起来这个问题并不要求这样做。