嵌套字典中包含的向量的相关性
Correlation of vectors contained in nested dictionaries
我有一个具有下一个结构的嵌套字典:
{Cell_name_1 : {KPI_name_1: [value1, value2, ..., valueN],
KPI_name_2: [value1, value2, ..., valueN],
...,
KPI_name_N: [value1, value2, ..., valueN]},
Cell_name_2 : {KPI_name_1: [value1, value2, ..., valueN], ...},
Cell_name_N : {....}}
我想检查不同单元格中包含的 vectos 之间的相关性(我已经定义了这个方法,所以它是一个辅助函数)。比方说:
vector_1 = [64.0, 66.0, 53.5, 52.1, 54.0] #[values from KPI_name_1 from Cell_name_1]
vector_2 = [84.0, 86.0, 63.5, 72.1, 24.0] #[values from KPI_name_2 from Cell_name_2]
correlation(vector_1, vector_2)
我尝试了不同的字典循环方式(普通 for 循环、带 while 和条件的经典循环等),但我没有找到我需要的方法。
举个例子,代码是这样的:
dic_sem = {'16895555': {'KPI_name_1': [64.0, 66.0, 53.5, 52.1, 54.0],
'KPI_name_2': [54.0, 56.0, 23.5, 32.1, 84.0]},
'16894444': {'KPI_name_1': [84.0, 86.0, 63.5, 72.1, 24.0],
'KPI_name_2': [24.0, 26.0, 63.5, 92.1, 84.0]}}
'16895555'
和'16894444'
是不同的Cell_name's
。
您可以遍历字典并创建单元名称字典,例如KPI_name_1
到包含您的向量的列表列表
from collections import defaultdict
vectors = defaultdict(list)
#Iterate over the values
for value in dic_sem.values():
#Create your vectors dictionary
for k, v in value.items():
vectors[k].append(v)
print(dict(vectors))
输出将是
{'KPI_name_1': [[64.0, 66.0, 53.5, 52.1, 54.0], [84.0, 86.0, 63.5, 72.1, 24.0]],
'KPI_name_2': [[54.0, 56.0, 23.5, 32.1, 84.0], [24.0, 26.0, 63.5, 92.1, 84.0]]}
然后您可以迭代此字典的值并相应地调用 correlation
for value in vectors.values():
print(value[0], value[1])
#correlation(*value)
这里的输出将是
[64.0, 66.0, 53.5, 52.1, 54.0] [84.0, 86.0, 63.5, 72.1, 24.0]
[54.0, 56.0, 23.5, 32.1, 84.0] [24.0, 26.0, 63.5, 92.1, 84.0]
也许 itertools.product
可以帮到您:
import itertools
import numpy as np
# Get vector names (assuming keys present in all cells)
field_names = list(dic_sem.values())[0].keys()
# Precompute all pairs of cells
all_cell_pairs = list(itertools.product(dic_sem.keys(), dic_sem.keys()))
corr = {}
for field in field_names:
corr[field] = np.reshape([correlation(dic_sem[c1][field], dic_sem[c2][field]) for c1, c2 in all_cell_pairs], (len(dic_sem), -1))
请注意,我们在这里进行了两倍以上的必要计算:相关矩阵是对称的,因此足以仅计算上三角或下三角(例如使用 itertools.combinations
),不包括对角线(等于 1)。不过上面应该给了方向..
我有一个具有下一个结构的嵌套字典:
{Cell_name_1 : {KPI_name_1: [value1, value2, ..., valueN],
KPI_name_2: [value1, value2, ..., valueN],
...,
KPI_name_N: [value1, value2, ..., valueN]},
Cell_name_2 : {KPI_name_1: [value1, value2, ..., valueN], ...},
Cell_name_N : {....}}
我想检查不同单元格中包含的 vectos 之间的相关性(我已经定义了这个方法,所以它是一个辅助函数)。比方说:
vector_1 = [64.0, 66.0, 53.5, 52.1, 54.0] #[values from KPI_name_1 from Cell_name_1]
vector_2 = [84.0, 86.0, 63.5, 72.1, 24.0] #[values from KPI_name_2 from Cell_name_2]
correlation(vector_1, vector_2)
我尝试了不同的字典循环方式(普通 for 循环、带 while 和条件的经典循环等),但我没有找到我需要的方法。
举个例子,代码是这样的:
dic_sem = {'16895555': {'KPI_name_1': [64.0, 66.0, 53.5, 52.1, 54.0],
'KPI_name_2': [54.0, 56.0, 23.5, 32.1, 84.0]},
'16894444': {'KPI_name_1': [84.0, 86.0, 63.5, 72.1, 24.0],
'KPI_name_2': [24.0, 26.0, 63.5, 92.1, 84.0]}}
'16895555'
和'16894444'
是不同的Cell_name's
。
您可以遍历字典并创建单元名称字典,例如KPI_name_1
到包含您的向量的列表列表
from collections import defaultdict
vectors = defaultdict(list)
#Iterate over the values
for value in dic_sem.values():
#Create your vectors dictionary
for k, v in value.items():
vectors[k].append(v)
print(dict(vectors))
输出将是
{'KPI_name_1': [[64.0, 66.0, 53.5, 52.1, 54.0], [84.0, 86.0, 63.5, 72.1, 24.0]],
'KPI_name_2': [[54.0, 56.0, 23.5, 32.1, 84.0], [24.0, 26.0, 63.5, 92.1, 84.0]]}
然后您可以迭代此字典的值并相应地调用 correlation
for value in vectors.values():
print(value[0], value[1])
#correlation(*value)
这里的输出将是
[64.0, 66.0, 53.5, 52.1, 54.0] [84.0, 86.0, 63.5, 72.1, 24.0]
[54.0, 56.0, 23.5, 32.1, 84.0] [24.0, 26.0, 63.5, 92.1, 84.0]
也许 itertools.product
可以帮到您:
import itertools
import numpy as np
# Get vector names (assuming keys present in all cells)
field_names = list(dic_sem.values())[0].keys()
# Precompute all pairs of cells
all_cell_pairs = list(itertools.product(dic_sem.keys(), dic_sem.keys()))
corr = {}
for field in field_names:
corr[field] = np.reshape([correlation(dic_sem[c1][field], dic_sem[c2][field]) for c1, c2 in all_cell_pairs], (len(dic_sem), -1))
请注意,我们在这里进行了两倍以上的必要计算:相关矩阵是对称的,因此足以仅计算上三角或下三角(例如使用 itertools.combinations
),不包括对角线(等于 1)。不过上面应该给了方向..