Python：用字典项中所有可能的值替换每个字母

Question

我想制作一个字符串列表，将 Amino 中的每个字母替换为以下字典项中列表中的所有字符串：

Amino = "mvkhdlsr"

dict = {
'f' : ['UUU', 'UUC'],
'l' : ['UUA', 'UUG', 'CUU', 'CUG', 'CUA', 'CUG'],
'i' : ['AUU', 'AUC', 'AUA'],
'm' : ['AUG'],
'v' : ['GUU', 'GUC', 'GUA', 'GUG'],
's' : ['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC'],
'p' : ['CCU', 'CCC', 'CCA', 'CCG'],
't' : ['ACU', 'ACC', 'ACA', 'ACG'],
'a' : ['GCU', 'GCC', 'GCA', 'GCG'],
'y' : ['UAU', 'UAC'],
'x' : ['UAA', 'UAG', 'UGA'],
'h' : ['CAU', 'CAC'],
'q' : ['CAA', 'CAG'],
'n' : ['AAU', 'AAC'],
'k' : ['AAA', 'AAG'],
'd' : ['GAU', 'GAC'],
'e' : ['GAA', 'GAG'],
'c' : ['UGU', 'UGC'],
'w' : ['UGG'],
'r' : ['CGU', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'],
'g' : ['GGU', 'GGC', 'GGA', 'GGG']
}

例如，如果 Amino 是 "mfy"，则期望的输出是

AUGUUUUAU
AUGUUUUAC
AUGUUCUAU
AUGUUCUAC

因为m只有一种情况（AUG），f有两种情况（UUU，UUC），y也有两种情况(UAU,UAC).

我试过类似

for word in Amino.split():
    if word in dict:
        for key, value in dict.items():
            for i in (0,len(value) - 1):
                for idx in value:

（未完成的代码）但想不通。

Answer 1

试试这个

from itertools import product
from operator import itemgetter

# itemgetter gets the values where letters in Amino are keys
# product creates Cartesian product from the lists
# join each tuple with "".join
list(map("".join, product(*itemgetter(*Amino)(dict))))

# ['AUGGUUAAACAUGAUUUAUCUCGU',
#  'AUGGUUAAACAUGAUUUAUCUCGC',
#  'AUGGUUAAACAUGAUUUAUCUCGA',
#  'AUGGUUAAACAUGAUUUAUCUCGG',
#  'AUGGUUAAACAUGAUUUAUCUAGA',
#  ...]

对于Amino = "mfy"，步骤是

itemgetter(*Amino)(dct)
# (['AUG'], ['UUU', 'UUC'], ['UAU', 'UAC'])

list(product(*itemgetter(*Amino)(dct)))
# [('AUG', 'UUU', 'UAU'), ('AUG', 'UUU', 'UAC'), ('AUG', 'UUC', 'UAU'), ('AUG', 'UUC', 'UAC')]

list(map("".join, product(*itemgetter(*Amino)(dct))))
# ['AUGUUUUAU', 'AUGUUUUAC', 'AUGUUCUAU', 'AUGUUCUAC']

Answer 2

使用 itertools 会是一个更快的解决方案，但这也可以。


def output(text):
    lst = [i for i in d[text[0]]]
    for char in text[1:]:
        val = d[char]
        lst = [seq1 + seq2 for seq1 in lst for seq2 in val]
    return lst


[print(i) for i in output('mfy')]

输出

AUGUUUUAU
AUGUUUUAC
AUGUUCUAU
AUGUUCUAC

Answer 3

一种可能的直观方法是使用递归，我们基本上遍历目标氨基的每个子串并构建我们可能的响应集。

您可以尝试这样的操作：

Amino = "mvkhdlsr"

conversions = {
    'f' : ['UUU', 'UUC'],
    'l' : ['UUA', 'UUG', 'CUU', 'CUG', 'CUA', 'CUG'],
    'i' : ['AUU', 'AUC', 'AUA'],
    'm' : ['AUG'],
    'v' : ['GUU', 'GUC', 'GUA', 'GUG'],
    's' : ['UCU', 'UCC', 'UCA', 'UCG', 'AGU', 'AGC'],
    'p' : ['CCU', 'CCC', 'CCA', 'CCG'],
    't' : ['ACU', 'ACC', 'ACA', 'ACG'],
    'a' : ['GCU', 'GCC', 'GCA', 'GCG'],
    'y' : ['UAU', 'UAC'],
    'x' : ['UAA', 'UAG', 'UGA'],
    'h' : ['CAU', 'CAC'],
    'q' : ['CAA', 'CAG'],
    'n' : ['AAU', 'AAC'],
    'k' : ['AAA', 'AAG'],
    'd' : ['GAU', 'GAC'],
    'e' : ['GAA', 'GAG'],
    'c' : ['UGU', 'UGC'],
    'w' : ['UGG'],
    'r' : ['CGU', 'CGC', 'CGA', 'CGG', 'AGA', 'AGG'],
    'g' : ['GGU', 'GGC', 'GGA', 'GGG']
}

class AminoSolver:
    def __init__(self, amino_conversions):
        self.amino_conversions = amino_conversions
        
    def get_amino_combinations(self, target_amino):
        # Base cases. If we only have 1 letter in the amino, we just return the list.
        # (i.e. if target_amino is 'a', we return ['GCU', 'GCC', 'GCA', 'GCG']). W/ 0,
        # we return an empty list
        if len(target_amino) == 0:
            return []
        if len(target_amino) == 1: 
            return self.amino_conversions[target_amino]
        
        new_potential_combinations = []
        # Iterate through each possible codon of the first letter of the current substring
        for possible_codon in self.amino_conversions[target_amino[0]]:
            # Generate all possible combinations for the target amino minus the first letter
            for previous_combination in self.get_amino_combinations(target_amino[1:]):
                # Add the possible codons from the first letter to all of the combinations generated prior
                new_potential_combinations.append(possible_codon + previous_combination)
        return new_potential_combinations
        
a = AminoSolver(conversions)
print(a.get_amino_combinations("mfy"))

(returns ['AUGUUUUAC', 'AUGUUCUAU', 'AUGUUUUAU', 'AUGUUCUAC'])

这里的递归基本上是这样工作的：

假设我们有一个长度为 1 的目标氨基。可能的组合只是该字母的转换（即“v”= ['GUU'、'GUC'、'GUA'、 'GUG'])，所以我们只需要 return。
想象一下，现在我们有一个长度为 2 的氨基（即“hq”）。我们这样做的方法基本上是从第一个字母的一个可能的密码子开始（在这种情况下我们说“CAU”）然后将第二个字母的所有转换添加到它（在这种情况下，我们会得到“CAU”+“CAA”和“CAU”+“CAG”）。然后，我们将对第一个字母中所有可能的密码子重复该操作，以获得长度为 2 的氨基酸的所有可能组合（在本例中为“CAUCAA”、“CAUCAG”、“CACCAA”和“CACCAG”）。
现在，对于长度为 n 的氨基，我们可以对长度为 2 的氨基进行类似的处理。基本上，对于长度为 n 的字符串的第一个字母的每个可能的密码子，我们找到从第二个字母到最后一个字母（长度为 n-1 的字符串）的子字符串的所有可能组合，并以类似的方式附加它们2个长度的氨基酸。

因此，此递归将为我们提供所有可能的组合。

Python：用字典项中所有可能的值替换每个字母

Python: replace each of the letter with all possible values in dictionary item

python

list