如何通过 Python 中的 for 循环传递列表列表?
How to pass a list of lists through a for loop in Python?
我有一个列表列表:
sample = [['TTTT', 'CCCZ'], ['ATTA', 'CZZC']]
count = [[4,3],[4,2]]
correctionfactor = [[1.33, 1.5],[1.33,2]]
我计算每个字符的频率 (pi),对其求平方然后求和(然后我计算 het = 1 - 求和)。
The desired output [[1,2],[1,2]] #NOTE: This is NOT the real values of expected output. I just need the real values to be in this format.
问题:我不知道如何在此循环中传递列表列表(样本,计数)以提取所需的值。我以前使用此代码仅传递了一个列表(例如 ['TACT','TTTT'..]
)。
- 我怀疑我需要添加一个更大的 for 循环,它索引样本中的每个元素(即索引
sample[0] = ['TTTT', 'CCCZ']
和 sample[1] = ['ATTA', 'CZZC']
。我不确定如何将它合并到代码。
**
代码
list_of_hets = []
for idx, element in enumerate(sample):
count_dict = {}
square_dict = {}
for base in list(element):
if base in count_dict:
count_dict[base] += 1
else:
count_dict[base] = 1
for allele in count_dict: #Calculate frequency of every character
square_freq = (count_dict[allele] / count[idx])**2 #Square the frequencies
square_dict[allele] = square_freq
pf = 0.0
for i in square_dict:
pf += square_dict[i] # pf --> pi^2 + pj^2...pn^2 #Sum the frequencies
het = 1-pf
list_of_hets.append(het)
print list_of_hets
"Failed" OUTPUT:
line 70, in <module>
square_freq = (count_dict[allele] / count[idx])**2
TypeError: unsupported operand type(s) for /: 'int' and 'list'er
我不完全清楚您想如何处理数据中的 'Z' 项,但此代码复制了 https://eval.in/658468
中示例数据的输出
from __future__ import division
bases = set('ACGT')
#sample = [['TTTT', 'CCCZ'], ['ATTA', 'CZZC']]
sample = [['ATTA', 'TTGA'], ['TTCA', 'TTTA']]
list_of_hets = []
for element in sample:
hets = []
for seq in element:
count_dict = {}
for base in seq:
if base in count_dict:
count_dict[base] += 1
else:
count_dict[base] = 1
print count_dict
#Calculate frequency of every character
count = sum(1 for u in seq if u in bases)
pf = sum((base / count) ** 2 for base in count_dict.values())
hets.append(1 - pf)
list_of_hets.append(hets)
print list_of_hets
输出
{'A': 2, 'T': 2}
{'A': 1, 'T': 2, 'G': 1}
{'A': 1, 'C': 1, 'T': 2}
{'A': 1, 'T': 3}
[[0.5, 0.625], [0.625, 0.375]]
可以使用 collections.Counter 代替 count_dict
来进一步简化此代码。
顺便说一句,如果不在 'ACGT' 中的符号总是 'Z' 那么我们可以加快 count
的计算速度。摆脱bases = set('ACGT')
并改变
count = sum(1 for u in seq if u in bases)
至
count = sum(1 for u in seq if u != 'Z')
我有一个列表列表:
sample = [['TTTT', 'CCCZ'], ['ATTA', 'CZZC']]
count = [[4,3],[4,2]]
correctionfactor = [[1.33, 1.5],[1.33,2]]
我计算每个字符的频率 (pi),对其求平方然后求和(然后我计算 het = 1 - 求和)。
The desired output [[1,2],[1,2]] #NOTE: This is NOT the real values of expected output. I just need the real values to be in this format.
问题:我不知道如何在此循环中传递列表列表(样本,计数)以提取所需的值。我以前使用此代码仅传递了一个列表(例如 ['TACT','TTTT'..]
)。
- 我怀疑我需要添加一个更大的 for 循环,它索引样本中的每个元素(即索引
sample[0] = ['TTTT', 'CCCZ']
和sample[1] = ['ATTA', 'CZZC']
。我不确定如何将它合并到代码。
** 代码
list_of_hets = []
for idx, element in enumerate(sample):
count_dict = {}
square_dict = {}
for base in list(element):
if base in count_dict:
count_dict[base] += 1
else:
count_dict[base] = 1
for allele in count_dict: #Calculate frequency of every character
square_freq = (count_dict[allele] / count[idx])**2 #Square the frequencies
square_dict[allele] = square_freq
pf = 0.0
for i in square_dict:
pf += square_dict[i] # pf --> pi^2 + pj^2...pn^2 #Sum the frequencies
het = 1-pf
list_of_hets.append(het)
print list_of_hets
"Failed" OUTPUT:
line 70, in <module>
square_freq = (count_dict[allele] / count[idx])**2
TypeError: unsupported operand type(s) for /: 'int' and 'list'er
我不完全清楚您想如何处理数据中的 'Z' 项,但此代码复制了 https://eval.in/658468
中示例数据的输出from __future__ import division
bases = set('ACGT')
#sample = [['TTTT', 'CCCZ'], ['ATTA', 'CZZC']]
sample = [['ATTA', 'TTGA'], ['TTCA', 'TTTA']]
list_of_hets = []
for element in sample:
hets = []
for seq in element:
count_dict = {}
for base in seq:
if base in count_dict:
count_dict[base] += 1
else:
count_dict[base] = 1
print count_dict
#Calculate frequency of every character
count = sum(1 for u in seq if u in bases)
pf = sum((base / count) ** 2 for base in count_dict.values())
hets.append(1 - pf)
list_of_hets.append(hets)
print list_of_hets
输出
{'A': 2, 'T': 2}
{'A': 1, 'T': 2, 'G': 1}
{'A': 1, 'C': 1, 'T': 2}
{'A': 1, 'T': 3}
[[0.5, 0.625], [0.625, 0.375]]
可以使用 collections.Counter 代替 count_dict
来进一步简化此代码。
顺便说一句,如果不在 'ACGT' 中的符号总是 'Z' 那么我们可以加快 count
的计算速度。摆脱bases = set('ACGT')
并改变
count = sum(1 for u in seq if u in bases)
至
count = sum(1 for u in seq if u != 'Z')