Python 中的嵌套循环将结果存储在单数词典中

Question

晚上好，

我目前正在研究一个项目，以便在继续攻读本科学位的同时进一步了解 Python。我正在尝试创建一个生物信息学程序，该程序利用马尔可夫模型来提供和预测整个过程中的某些 P(x) 语句。我正在努力清理我的代码，因为我发现了大量的重复。我不是在寻求答案 - Moreso 的建议或可能是一个方向的推动，让我以积极和 Python 为中心的心态前进。

Python有什么方法可以转

aa_count = markov_data_set.count('AA')
at_count = markov_data_set.count('AT')
ag_count = markov_data_set.count('AG')
ac_count = markov_data_set.count('AC')
tt_count = markov_data_set.count('TT')
ta_count = markov_data_set.count('TA')
tg_count = markov_data_set.count('TG')
tc_count = markov_data_set.count('TC')
cc_count = markov_data_set.count('CC')
ca_count = markov_data_set.count('CA')
cg_count = markov_data_set.count('CG')
ct_count = markov_data_set.count('CT')
gg_count = markov_data_set.count('GG')
ga_count = markov_data_set.count('GA')
gt_count = markov_data_set.count('GT')
gc_count = markov_data_set.count('GC')

进入更简单的东西？我已经阅读了几本关于 Python 的书（Python 的速成课程和 Python 的科学编码入门），我相信我可以使用循环或嵌套循环来制作更短、更有条理的东西.我试过的例子如下：

di_nucleotide = ('AA', 'AT', 'AG', 'AC', 'TT', 'TA', 'TG', 'TC', 'CC', 'CA', 'CG', 'CT', 'GG', 'GA', 'GT', 'GC')
nucleotide_count = ()
nucleotide_frequency = []

for binomials in di_nucleotide:
     di_nucleotide.count()

可悲的是，问题是...我从那里卡住了，这有点令人沮丧。我希望最终产品是将 Var1 和 Var2 存储到一个单一的字典文件中，我可以存储或稍后调用，同时根据需要将这两个变量分开。

di_nucleotide = ('AA', 'AT', 'AG', 'AC', 'TT', 'TA', 'TG', 'TC', 'CC', 'CA', 'CG', 'CT', 'GG', 'GA', 'GT', 'GC')
nucleotide_count = (int1, int2, int3, int4, ...)
nucleotide_frequency = ['AA':Count, 'AT'Count, 'AG'Count, ...]

这将是我在 SO 上的第一个 post。我知道这可能不是寻求建议的最佳途径，但如果我可以做些什么来让我的 post 将来变得更好，请告诉我，以便我改进。

一如既往，谢谢大家，祝你有美好的一天！我期待着继续我的编码之旅。

Answer 1

您可以将所有内容存储在动态生成的字典中：

# initialise dictionary and total counts
nucleotide_counts = {}
total_counts = 0

# loop through dinucleotide counts
for dn in ['AA', 'AT', 'AG', 'AC', 'TT', 'TA', 'TG', 'TC', 'CC', 'CA', 'CG', 'CT', 'GG', 'GA', 'GT', 'GC']:
    # store in dictionary
    counts = markov_data_set.count(dn)
    nucleotide_counts[dn] = counts
    total_counts += counts

从那里，您可以生成频率：

frequencies = {}
for dn, counts in nucleotide_counts.items():
    frequencies[dn] = counts / total_counts

Answer 2

使用itertools.product生成对：

import itertools

bases = 'ACGT'
nucs = [''.join(pair) for pair in itertools.product(bases, repeat=2)]
# ['AA', 'AC', 'AG' ....

然后您可以运行字典理解内的循环中的函数，替换您的个人调用：

counts = {nuc: markov_data_set.count(nuc) for nuc in nucs}

counts 是您的结果的字典。键值是'AA'、'AC'等等。

Python 中的嵌套循环将结果存储在单数词典中

Nested Loops in Python Storing Results in a singular Dictionary

python

loops

bioinformatics

nested-loops