使用列名和行名作为具有字典理解的键创建嵌套字典
Create a nested dictionary using columns and row names as keys with dictionary comprehension
上下文:我有以下数据框:
gene_id Control_3Aligned.sortedByCoord.out.gtf Control_4Aligned.sortedByCoord.out.gtf ... NET_101Aligned.sortedByCoord.out.gtf NET_103Aligned.sortedByCoord.out.gtf NET_105Aligned.sortedByCoord.out.gtf
0 ENSG00000213279|Z97192.2 0 0 ... 3 2 7
1 ENSG00000132680|KHDC4 625 382 ... 406 465 262
2 ENSG00000145041|DCAF1 423 104 ... 231 475 254
3 ENSG00000102547|CAB39L 370 112 ... 265 393 389
4 ENSG00000173826|KCNH6 0 0 ... 0 0 0
我想要一个嵌套字典作为这个例子:
{Control_3Aligned.sortedByCoord.out.gtf:
{ENSG00000213279|Z97192.2:0,
ENSG00000132680|KHDC4:625,...},
Control_4Aligned.sortedByCoord.out.gtf:
{ENSG00000213279|Z97192.2:0,
ENSG00000132680|KHDC4:382,...}}
所以一般格式为:
{column_name : {row_name:value,...},...}
我正在尝试这样的事情:
sample_dict ={}
for column in df.columns[1:]:
for index in range(0,len(df.index)+1):
sample_dict.setdefault(column, {row_name:value for row_name,value in zip(df.iloc[index,0], df.loc[index,column])})
sample_dict[column] += {row_name:value for row_name,value in zip(df.iloc[index,0], df.loc[index,column])}
但我一直收到 TypeError: 'numpy.int64' object is not iterable
(问题似乎出在 zip() 中,因为 zip 只接受可迭代对象,在这个例子中我并没有真正这样做,而且我肯定是这样做的也填充字典)
非常欢迎任何帮助!提前谢谢你
设法做到这样:
sample_dict ={}
gene_list = []
for index in range(0,len(df.index)):
temp_data = df.loc[index,'gene_id']
gene_list.append(temp_data)
for column in df.columns[1:]:
column_list = df.loc[:,column]
gene_dict = {}
for index in range(0,len(df.index)):
if gene_list[index] not in gene_dict:
gene_dict[gene_list[index]]=df.loc[index,column]
sample_dict[column] = gene_dict
sample_dict.items()
dict_pairs = sample_dict.items()
pairs_iterator = iter(dict_pairs)
first_pair = next(pairs_iterator)
first_pair
上下文:我有以下数据框:
gene_id Control_3Aligned.sortedByCoord.out.gtf Control_4Aligned.sortedByCoord.out.gtf ... NET_101Aligned.sortedByCoord.out.gtf NET_103Aligned.sortedByCoord.out.gtf NET_105Aligned.sortedByCoord.out.gtf
0 ENSG00000213279|Z97192.2 0 0 ... 3 2 7
1 ENSG00000132680|KHDC4 625 382 ... 406 465 262
2 ENSG00000145041|DCAF1 423 104 ... 231 475 254
3 ENSG00000102547|CAB39L 370 112 ... 265 393 389
4 ENSG00000173826|KCNH6 0 0 ... 0 0 0
我想要一个嵌套字典作为这个例子:
{Control_3Aligned.sortedByCoord.out.gtf:
{ENSG00000213279|Z97192.2:0,
ENSG00000132680|KHDC4:625,...},
Control_4Aligned.sortedByCoord.out.gtf:
{ENSG00000213279|Z97192.2:0,
ENSG00000132680|KHDC4:382,...}}
所以一般格式为:
{column_name : {row_name:value,...},...}
我正在尝试这样的事情:
sample_dict ={}
for column in df.columns[1:]:
for index in range(0,len(df.index)+1):
sample_dict.setdefault(column, {row_name:value for row_name,value in zip(df.iloc[index,0], df.loc[index,column])})
sample_dict[column] += {row_name:value for row_name,value in zip(df.iloc[index,0], df.loc[index,column])}
但我一直收到 TypeError: 'numpy.int64' object is not iterable
(问题似乎出在 zip() 中,因为 zip 只接受可迭代对象,在这个例子中我并没有真正这样做,而且我肯定是这样做的也填充字典)
非常欢迎任何帮助!提前谢谢你
设法做到这样:
sample_dict ={}
gene_list = []
for index in range(0,len(df.index)):
temp_data = df.loc[index,'gene_id']
gene_list.append(temp_data)
for column in df.columns[1:]:
column_list = df.loc[:,column]
gene_dict = {}
for index in range(0,len(df.index)):
if gene_list[index] not in gene_dict:
gene_dict[gene_list[index]]=df.loc[index,column]
sample_dict[column] = gene_dict
sample_dict.items()
dict_pairs = sample_dict.items()
pairs_iterator = iter(dict_pairs)
first_pair = next(pairs_iterator)
first_pair