如何从数据框创建嵌套字典
How to create a nested dictionary from a dataframe
我有一个这样的数据框:
df = pd.DataFrame(id:{1,2,1,4,4},
course:{math,math,sci,art,math},
result:{pass,pass,fail,fail,fail}}
我想像这样创建一个嵌套字典:
对于每个 ID,我想制作一个包含已通过课程和未通过课程的嵌套字典。
{id:{pass:{courses},fail:{courses}}}
{1:{pass:{math},fail:{sci}},2:{pass:{math}},4:{fail:{art,math}}}
非常感谢
试试这个:
df = pd.DataFrame({'id':[1,2,1,4,4],
'course':['math','math','sci','art','math'],
'result':['pass','pass','fail','fail','fail']})
df.groupby(['result', 'id'])['course'].agg(list).unstack().to_dict()
输出:
{1: {'fail': ['sci'], 'pass': ['math']},
2: {'fail': nan, 'pass': ['math']},
4: {'fail': ['art', 'math'], 'pass': nan}}
嗯,是的,在@mozway 解决方案中达到顶峰,使用 set
而不是 list
:
df.groupby(['result', 'id'])['course'].agg(set).unstack().to_dict()
输出:
{1: {'fail': {'sci'}, 'pass': {'math'}},
2: {'fail': nan, 'pass': {'math'}},
4: {'fail': {'art', 'math'}, 'pass': nan}}
假设输入:
id course result
0 1 math pass
1 2 math pass
2 1 sci fail
3 4 art fail
4 4 math fail
你可以使用嵌套的groupby:
out = (df.groupby('id')
.apply(lambda g: g.groupby('result')['course']
.agg(set).to_dict())
.to_dict()
)
输出:
{1: {'fail': {'sci'}, 'pass': {'math'}},
2: {'pass': {'math'}},
4: {'fail': {'art', 'math'}}}
或枢轴table:
(df.pivot_table(columns='id', index='result', values='course', aggfunc=set)
.to_dict()
)
输出:
{1: {'fail': {'sci'}, 'pass': {'math'}},
2: {'fail': nan, 'pass': {'math'}},
4: {'fail': {'art', 'math'}, 'pass': nan}}
我有一个这样的数据框:
df = pd.DataFrame(id:{1,2,1,4,4},
course:{math,math,sci,art,math},
result:{pass,pass,fail,fail,fail}}
我想像这样创建一个嵌套字典: 对于每个 ID,我想制作一个包含已通过课程和未通过课程的嵌套字典。
{id:{pass:{courses},fail:{courses}}}
{1:{pass:{math},fail:{sci}},2:{pass:{math}},4:{fail:{art,math}}}
非常感谢
试试这个:
df = pd.DataFrame({'id':[1,2,1,4,4],
'course':['math','math','sci','art','math'],
'result':['pass','pass','fail','fail','fail']})
df.groupby(['result', 'id'])['course'].agg(list).unstack().to_dict()
输出:
{1: {'fail': ['sci'], 'pass': ['math']},
2: {'fail': nan, 'pass': ['math']},
4: {'fail': ['art', 'math'], 'pass': nan}}
嗯,是的,在@mozway 解决方案中达到顶峰,使用 set
而不是 list
:
df.groupby(['result', 'id'])['course'].agg(set).unstack().to_dict()
输出:
{1: {'fail': {'sci'}, 'pass': {'math'}},
2: {'fail': nan, 'pass': {'math'}},
4: {'fail': {'art', 'math'}, 'pass': nan}}
假设输入:
id course result
0 1 math pass
1 2 math pass
2 1 sci fail
3 4 art fail
4 4 math fail
你可以使用嵌套的groupby:
out = (df.groupby('id')
.apply(lambda g: g.groupby('result')['course']
.agg(set).to_dict())
.to_dict()
)
输出:
{1: {'fail': {'sci'}, 'pass': {'math'}},
2: {'pass': {'math'}},
4: {'fail': {'art', 'math'}}}
或枢轴table:
(df.pivot_table(columns='id', index='result', values='course', aggfunc=set)
.to_dict()
)
输出:
{1: {'fail': {'sci'}, 'pass': {'math'}},
2: {'fail': nan, 'pass': {'math'}},
4: {'fail': {'art', 'math'}, 'pass': nan}}