使用 JavaScript 将核苷酸转化为氨基酸

Converting nucleotides to amino acids using JavaScript

我正在创建一个 Chrome 扩展,它将一串长度为 nlen 的核苷酸转换为相应的氨基酸。

我之前在 Python 中做过类似的事情,但由于我对 JavaScript 还是很陌生,所以我正在努力将相同的逻辑从 Python 翻译成JavaScript。到目前为止我的代码如下:

function translateInput(n_seq) {
  // code to translate goes here

  // length of input nucleotide sequence
  var nlen = n_seq.length

  // declare initially empty amino acids string
  var aa_seq = ""

  // iterate over each chunk of three characters/nucleotides
  // to match it with the correct codon
  for (var i = 0; i < nlen; i++) {




      aa_seq.concat(codon)
  }

  // return final string of amino acids   
  return aa_seq
}

我知道我想一次遍历三个字符,将它们与正确的氨基酸匹配,然后将该氨基酸连续连接到输出的氨基酸字符串 (aa_seq),返回循环完成后该字符串。

我还尝试创建 a dictionary 密码子与氨基酸的关系,想知道是否有一种方法可以使用类似的东西作为工具将三个字符密码子与其各自的氨基酸相匹配:

codon_dictionary = { 
 "A": ["GCA","GCC","GCG","GCT"], 
 "C": ["TGC","TGT"], 
 "D": ["GAC", "GAT"],
 "E": ["GAA","GAG"],
 "F": ["TTC","TTT"],
 "G": ["GGA","GGC","GGG","GGT"],
 "H": ["CAC","CAT"],
 "I": ["ATA","ATC","ATT"],
 "K": ["AAA","AAG"],
 "L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
 "M": ["ATG"],
 "N": ["AAC","AAT"],
 "P": ["CCA","CCC","CCG","CCT"],
 "Q": ["CAA","CAG"],
 "R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
 "S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
 "T": ["ACA","ACC","ACG","ACT"],
 "V": ["GTA","GTC","GTG","GTT"],
 "W": ["TGG"],
 "Y": ["TAC","TAT"],
};

编辑: 核苷酸输入字符串的一个例子是 "AAGCATAGAAATCGAGGG",对应的输出字符串是 "KHRNRG"。希望这对您有所帮助!

您可以使用 for 循环,String.prototype.slice() 从字符串开头一次迭代字符串三个字符 for..of 循环,Object.entries() 迭代属性和值codon_dictionary 对象,Array.prototype.includes() 将输入字符串的当前三个字符部分与设置为 codon_dictionary 对象值的数组相匹配,将 属性 连接到字符串变量。

const codon_dictionary = { 
 "A": ["GCA","GCC","GCG","GCT"], 
 "C": ["TGC","TGT"], 
 "D": ["GAC", "GAT"],
 "E": ["GAA","GAG"],
 "F": ["TTC","TTT"],
 "G": ["GGA","GGC","GGG","GGT"],
 "H": ["CAC","CAT"],
 "I": ["ATA","ATC","ATT"],
 "K": ["AAA","AAG"],
 "L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
 "M": ["ATG"],
 "N": ["AAC","AAT"],
 "P": ["CCA","CCC","CCG","CCT"],
 "Q": ["CAA","CAG"],
 "R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
 "S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
 "T": ["ACA","ACC","ACG","ACT"],
 "V": ["GTA","GTC","GTG","GTT"],
 "W": ["TGG"],
 "Y": ["TAC","TAT"],
};

const [entries, n] = [Object.entries(codon_dictionary), 3];

let [str, res] = ["AAGCATAGAAATCGAGGG", ""];

for (let i = 0; i + n <= str.length; i += n)
  for (const [key, prop, curr = str.slice(i, i + n)] of entries) 
    if (prop.includes(curr)) {res +=  key; break;};

console.log(res);

另外,你可以把上面的答案(@guest271314)写成简洁的形式:

var res = ''
str.match(/.{1,3}/g).forEach(s => {
    var key = Object.keys(codon_dictionary).filter(x => codon_dictionary[x].filter(y => y === s).length > 0)[0]
    res += key != undefined ? key : ''
})

您可以在下面看到完整的答案。

const codon_dictionary = { 
 "A": ["GCA","GCC","GCG","GCT"], 
 "C": ["TGC","TGT"], 
 "D": ["GAC", "GAT"],
 "E": ["GAA","GAG"],
 "F": ["TTC","TTT"],
 "G": ["GGA","GGC","GGG","GGT"],
 "H": ["CAC","CAT"],
 "I": ["ATA","ATC","ATT"],
 "K": ["AAA","AAG"],
 "L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
 "M": ["ATG"],
 "N": ["AAC","AAT"],
 "P": ["CCA","CCC","CCG","CCT"],
 "Q": ["CAA","CAG"],
 "R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
 "S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
 "T": ["ACA","ACC","ACG","ACT"],
 "V": ["GTA","GTC","GTG","GTT"],
 "W": ["TGG"],
 "Y": ["TAC","TAT"],
};

const str = "AAGCATAGAAATCGAGGG";

let res = "";
// just rewrite the above code into the short answer
str.match(/.{1,3}/g).forEach(s => {
    var key = Object.keys(codon_dictionary).filter(x => codon_dictionary[x].filter(y => y === s).length > 0)[0]
    res += key != undefined ? key : ''
})

console.log(res);

意见

我个人推荐的第一件事是构建一个从 3 字符密码子到氨基的字典。这将允许您的程序采用多个密码子串链并将它们转换为氨基串,而不必每次都进行昂贵的深度查找。字典会像这样工作

codonDict['GCA'] // 'A'
codonDict['TGC'] // 'C'
// etc

从那里,我实现了两个实用函数:slideslideStr。这些都不是特别重要,所以我将用几个输入和输出示例来介绍它们。

slide (2,1) ([1,2,3,4])
// [[1,2], [2,3], [3,4]]

slide (2,2) ([1,2,3,4])
// [[1,2], [3,4]]

slideStr (2,1) ('abcd')
// ['ab', 'bc', 'cd']

slideStr (2,2) ('abcd')
// ['ab', 'cd']

有了反向字典和通用实用函数,编写 codon2amino 变得轻而易举

// codon2amino :: String -> String
const codon2amino = str =>
  slideStr(3,3)(str)
    .map(c => codonDict[c])
    .join('')

可运行演示

为了澄清,我们基于 aminoDict 一次构建codonDict,并将其重新用于每个密码子到氨基的计算。

// your original data renamed to aminoDict
const aminoDict = { 'A': ['GCA','GCC','GCG','GCT'], 'C': ['TGC','TGT'], 'D': ['GAC', 'GAT'], 'E': ['GAA','GAG'], 'F': ['TTC','TTT'], 'G': ['GGA','GGC','GGG','GGT'], 'H': ['CAC','CAT'], 'I': ['ATA','ATC','ATT'], 'K': ['AAA','AAG'], 'L': ['CTA','CTC','CTG','CTT','TTA','TTG'], 'M': ['ATG'], 'N': ['AAC','AAT'], 'P': ['CCA','CCC','CCG','CCT'], 'Q': ['CAA','CAG'], 'R': ['AGA','AGG','CGA','CGC','CGG','CGT'], 'S': ['AGC','AGT','TCA','TCC','TCG','TCT'], 'T': ['ACA','ACC','ACG','ACT'], 'V': ['GTA','GTC','GTG','GTT'], 'W': ['TGG'], 'Y': ['TAC','TAT'] };

// codon dictionary derived from aminoDict
const codonDict =
 Object.keys(aminoDict).reduce((dict, a) =>
   Object.assign(dict, ...aminoDict[a].map(c => ({[c]: a}))), {})

// slide :: (Int, Int) -> [a] -> [[a]]
const slide = (n,m) => xs => {
  if (n > xs.length)
    return []
  else
    return [xs.slice(0,n), ...slide(n,m) (xs.slice(m))]
}

// slideStr :: (Int, Int) -> String -> [String]
const slideStr = (n,m) => str =>
  slide(n,m) (Array.from(str)) .map(s => s.join(''))

// codon2amino :: String -> String
const codon2amino = str =>
  slideStr(3,3)(str)
    .map(c => codonDict[c])
    .join('')

console.log(codon2amino('AAGCATAGAAATCGAGGG'))
// KHRNRG


进一步说明

can you clarify what some of these variables are supposed to represent? (n, m, xs, c, etc)

我们的 slide 函数使我们可以在数组上滑动 window。它需要两个参数 window – n window 大小和 m 步长 – 一个参数是要迭代的项目数组 – xs,可以读作 x's,或复数形式 x,如 x 项的集合

slide 是有意通用的,因为它可以在任何 iterable xs 上工作。这意味着它可以与数组、字符串或任何其他实现 Symbol.iterator 的东西一起工作。这也是为什么我们使用像 xs 这样的通用名称的原因,因为将它命名为特定的东西会让我们认为它只能用于特定类型

.map(c => codonDict[c]) 中的变量 c 等其他东西不是特别重要 – 我将其命名为 c for codon,但我们可以命名为xfoo,都没有关系。 "trick"理解c就是理解.map.

[1,2,3,4,5].map(c => f(c))
// [f(1), f(2), f(3), f(4), f(5)]

所以实际上我们在这里所做的就是获取一个数组 ([1 2 3 4 5]) 并创建一个新数组,我们在其中为原始数组中的每个元素调用 f

现在,当我们查看 .map(c => codonDict[c]) 时,我们明白我们所做的只是在 codonDict 中查找 c 中的每个元素

const codon2amino = str =>
  slideStr(3,3)(str)          // [ 'AAG', 'CAT', 'AGA', 'AAT', ...]
    .map(c => codonDict[c])   // [ codonDict['AAG'], codonDict['CAT'], codonDict['AGA'], codonDict['AAT'], ...]
    .join('')                 // 'KHRN...'

Also, are these 'const' items able to essentially replace my original translateInput() function?

如果您不熟悉 ES6 (ES2015),上面使用的一些语法对您来说可能很陌生。

// foo using traditional function syntax
function foo (x) { return x + 1 }

// foo as an arrow function
const foo = x => x + 1

所以简而言之,是的,codon2aminotranslateInput 的确切替代品,只是使用 const 绑定和箭头函数定义。我选择 codon2amino 作为名称,因为它更好地描述了函数的操作 - translateInput 没有说明它的翻译方式(A 到 B,还是 B 到 A?),并且 "input" 在这里是一种毫无意义的描述符,因为所有函数都可以接受输入。

您看到其他 const 声明的原因是因为我们将您的函数的工作拆分为多个函数。其原因大多超出了这个答案的范围,但简短的解释是,一个承担多项任务责任的专门功能对我们来说不如多个通用功能有用,这些功能可以 combined/re-used 以合理的方式.

当然,codon2amino 需要查看输入字符串中的每个 3 字母序列,但这并不意味着我们必须在 codon2amino 函数中编写字符串拆分代码。我们可以像使用 slideStr 那样编写一个通用的字符串拆分函数,这对于任何想要遍历字符串序列然后让我们的 codon2amino 函数使用它的函数都是有用的——如果我们封装了所有的字符串- 在 codon2amino 内拆分代码,下次我们需要遍历字符串序列时,我们必须复制该部分代码。


综上所述..

Is there any way I can do this while keeping my original for loop structure?

我真的认为您应该花一些时间单步执行上面的代码,看看它是如何工作的。如果您还没有看到以这种方式分离的程序关注点,那么那里有很多有价值的课程可以学习。

当然,这不是解决问题的唯一方法。我们可以 使用原始的for 循环。对我来说,考虑创建迭代器 i 并手动递增 i++i += 3、确保检查 i < str.length、重新分配 return 值是更多的精神开销result += something 等等——再添加几个变量,你的大脑很快就会变成汤。

function makeCodonDict (aminoDict) {
  let result = {}
  for (let k of Object.keys(aminoDict))
    for (let a of aminoDict[k])
      result[a] = k
  return result
}

function translateInput (dict, str) {
  let result = ''
  for (let i = 0; i < str.length; i += 3)
    result += dict[str.substr(i,3)]
  return result
}

const aminoDict = { 'A': ['GCA','GCC','GCG','GCT'], 'C': ['TGC','TGT'], 'D': ['GAC', 'GAT'], 'E': ['GAA','GAG'], 'F': ['TTC','TTT'], 'G': ['GGA','GGC','GGG','GGT'], 'H': ['CAC','CAT'], 'I': ['ATA','ATC','ATT'], 'K': ['AAA','AAG'], 'L': ['CTA','CTC','CTG','CTT','TTA','TTG'], 'M': ['ATG'], 'N': ['AAC','AAT'], 'P': ['CCA','CCC','CCG','CCT'], 'Q': ['CAA','CAG'], 'R': ['AGA','AGG','CGA','CGC','CGG','CGT'], 'S': ['AGC','AGT','TCA','TCC','TCG','TCT'], 'T': ['ACA','ACC','ACG','ACT'], 'V': ['GTA','GTC','GTG','GTT'], 'W': ['TGG'], 'Y': ['TAC','TAT'] };
const codonDict = makeCodonDict(aminoDict)

const codons = 'AAGCATAGAAATCGAGGG'
const aminos = translateInput(codonDict, codons)
console.log(aminos) // KHRNRG

嗯,我建议首先改变你的字典的形状——那样不是很有用,所以让我们这样做:

const dict = { 
 "A": ["GCA","GCC","GCG","GCT"], 
 "C": ["TGC","TGT"], 
 "D": ["GAC", "GAT"],
 "E": ["GAA","GAG"],
 "F": ["TTC","TTT"],
 "G": ["GGA","GGC","GGG","GGT"],
 "H": ["CAC","CAT"],
 "I": ["ATA","ATC","ATT"],
 "K": ["AAA","AAG"],
 "L": ["CTA","CTC","CTG","CTT","TTA","TTG"],
 "M": ["ATG"],
 "N": ["AAC","AAT"],
 "P": ["CCA","CCC","CCG","CCT"],
 "Q": ["CAA","CAG"],
 "R": ["AGA","AGG","CGA","CGC","CGG","CGT"],
 "S": ["AGC","AGT","TCA","TCC","TCG","TCT"],
 "T": ["ACA","ACC","ACG","ACT"],
 "V": ["GTA","GTC","GTG","GTT"],
 "W": ["TGG"],
 "Y": ["TAC","TAT"],
}
const codons = Object.keys(dict).reduce((a, b) => {dict[b].forEach(v => a[v] = b); return a}, {})

//In practice, you will get:

const codons = { GCA: 'A',
  GCC: 'A',
  GCG: 'A',
  GCT: 'A',
  TGC: 'C',
  TGT: 'C',
  GAC: 'D',
  GAT: 'D',
  GAA: 'E',
  GAG: 'E',
  TTC: 'F',
  TTT: 'F',
  GGA: 'G',
  GGC: 'G',
  GGG: 'G',
  GGT: 'G',
  CAC: 'H',
  CAT: 'H',
  ATA: 'I',
  ATC: 'I',
  ATT: 'I',
  AAA: 'K',
  AAG: 'K',
  CTA: 'L',
  CTC: 'L',
  CTG: 'L',
  CTT: 'L',
  TTA: 'L',
  TTG: 'L',
  ATG: 'M',
  AAC: 'N',
  AAT: 'N',
  CCA: 'P',
  CCC: 'P',
  CCG: 'P',
  CCT: 'P',
  CAA: 'Q',
  CAG: 'Q',
  AGA: 'R',
  AGG: 'R',
  CGA: 'R',
  CGC: 'R',
  CGG: 'R',
  CGT: 'R',
  AGC: 'S',
  AGT: 'S',
  TCA: 'S',
  TCC: 'S',
  TCG: 'S',
  TCT: 'S',
  ACA: 'T',
  ACC: 'T',
  ACG: 'T',
  ACT: 'T',
  GTA: 'V',
  GTC: 'V',
  GTG: 'V',
  GTT: 'V',
  TGG: 'W',
  TAC: 'Y',
  TAT: 'Y' }

//Now we are reasoning!

//From here on, it is pretty straightforward:

const rnaParser = s => s.match(/.{3}/g).map(fragment => codons[fragment]).join("")