扩展 Pandas 基于 DataFrame 的嵌套字典值
Expanding Pandas DataFrame based Nested Dictionary Values
简而言之,我希望根据我创建的映射架构将组级别视图扩展到这些组的各个组件中。
我有两组数据。我在 df
中有交易数据,在 nested
中有用于映射的嵌套字典设置。
import pandas as pd
nested = {"Group A":{"Component 1 Share": 0.25, "Component 2 Share": 0.25, "Component 3 Share": 0.25, "Component 4 Share": 0.25},
"Group B":{"Component 1 Share": 0.5, "Component 5 Share": 0.5}}
data = {'date': ['2018-12-01', '2018-12-01', '2018-12-02', '2018-12-02', '2018-12-02'],
'groups':['Group A', 'Group B', 'Group A', 'Group B', 'Group A'],
'sold': [100, 200, 200, 300, 60]}
df = pd.DataFrame(data, columns = ['date', 'groups','sold'])
我的目标是使用 nested
字典在组件级别将其转换为这种格式。我简化了两种数据结构,其中真正的 df
更大,而真正的 nested
字典有更多不同长度的元素。
goal_data = {'date': ['2018-12-01', '2018-12-01', '2018-12-01', '2018-12-01',
'2018-12-01', '2018-12-01',
'2018-12-02', '2018-12-02', '2018-12-02', '2018-12-02',
'2018-12-02', '2018-12-02',
'2018-12-02', '2018-12-02', '2018-12-02', '2018-12-02'],
'components':["Component 1 Share", "Component 2 Share", "Component 3 Share", "Component 4 Share",
"Component 1 Share", "Component 5 Share",
"Component 1 Share", "Component 2 Share", "Component 3 Share", "Component 4 Share",
"Component 1 Share", "Component 5 Share",
"Component 1 Share", "Component 2 Share", "Component 3 Share", "Component 4 Share"],
'sold': [25, 25, 25, 25,
100, 100,
50, 50, 50, 50,
150, 150,
15,15,15,15]}
component_df = pd.DataFrame(goal_data, columns=["date", "components", "sold"])
我尝试过各种方法,例如 map
、apply
、lookup
、和 merge
,但没有成功,但直觉上知道有一种方法可以扩展组级别数据到组件中。
你可以从 nested
dict
开始,然后 merge
每组
nestdict_f=pd.DataFrame(nested).stack().reset_index()
newdf=pd.concat([y.merge(nestdict_f,left_on='groups',right_on='level_1')for _,y in df.groupby('date')])
newdf['sold']=newdf['sold']*newdf[0]
newdf=newdf[['date','level_0','sold']].rename(columns={'level_0':'components'})
newdf
date components sold
0 2018-12-01 Component 1 Share 25.0
1 2018-12-01 Component 2 Share 25.0
2 2018-12-01 Component 3 Share 25.0
3 2018-12-01 Component 4 Share 25.0
4 2018-12-01 Component 1 Share 100.0
5 2018-12-01 Component 5 Share 100.0
0 2018-12-02 Component 1 Share 50.0
1 2018-12-02 Component 2 Share 50.0
2 2018-12-02 Component 3 Share 50.0
3 2018-12-02 Component 4 Share 50.0
4 2018-12-02 Component 1 Share 15.0
5 2018-12-02 Component 2 Share 15.0
6 2018-12-02 Component 3 Share 15.0
7 2018-12-02 Component 4 Share 15.0
8 2018-12-02 Component 1 Share 150.0
9 2018-12-02 Component 5 Share 150.0
简而言之,我希望根据我创建的映射架构将组级别视图扩展到这些组的各个组件中。
我有两组数据。我在 df
中有交易数据,在 nested
中有用于映射的嵌套字典设置。
import pandas as pd
nested = {"Group A":{"Component 1 Share": 0.25, "Component 2 Share": 0.25, "Component 3 Share": 0.25, "Component 4 Share": 0.25},
"Group B":{"Component 1 Share": 0.5, "Component 5 Share": 0.5}}
data = {'date': ['2018-12-01', '2018-12-01', '2018-12-02', '2018-12-02', '2018-12-02'],
'groups':['Group A', 'Group B', 'Group A', 'Group B', 'Group A'],
'sold': [100, 200, 200, 300, 60]}
df = pd.DataFrame(data, columns = ['date', 'groups','sold'])
我的目标是使用 nested
字典在组件级别将其转换为这种格式。我简化了两种数据结构,其中真正的 df
更大,而真正的 nested
字典有更多不同长度的元素。
goal_data = {'date': ['2018-12-01', '2018-12-01', '2018-12-01', '2018-12-01',
'2018-12-01', '2018-12-01',
'2018-12-02', '2018-12-02', '2018-12-02', '2018-12-02',
'2018-12-02', '2018-12-02',
'2018-12-02', '2018-12-02', '2018-12-02', '2018-12-02'],
'components':["Component 1 Share", "Component 2 Share", "Component 3 Share", "Component 4 Share",
"Component 1 Share", "Component 5 Share",
"Component 1 Share", "Component 2 Share", "Component 3 Share", "Component 4 Share",
"Component 1 Share", "Component 5 Share",
"Component 1 Share", "Component 2 Share", "Component 3 Share", "Component 4 Share"],
'sold': [25, 25, 25, 25,
100, 100,
50, 50, 50, 50,
150, 150,
15,15,15,15]}
component_df = pd.DataFrame(goal_data, columns=["date", "components", "sold"])
我尝试过各种方法,例如 map
、apply
、lookup
、和 merge
,但没有成功,但直觉上知道有一种方法可以扩展组级别数据到组件中。
你可以从 nested
dict
开始,然后 merge
每组
nestdict_f=pd.DataFrame(nested).stack().reset_index()
newdf=pd.concat([y.merge(nestdict_f,left_on='groups',right_on='level_1')for _,y in df.groupby('date')])
newdf['sold']=newdf['sold']*newdf[0]
newdf=newdf[['date','level_0','sold']].rename(columns={'level_0':'components'})
newdf
date components sold
0 2018-12-01 Component 1 Share 25.0
1 2018-12-01 Component 2 Share 25.0
2 2018-12-01 Component 3 Share 25.0
3 2018-12-01 Component 4 Share 25.0
4 2018-12-01 Component 1 Share 100.0
5 2018-12-01 Component 5 Share 100.0
0 2018-12-02 Component 1 Share 50.0
1 2018-12-02 Component 2 Share 50.0
2 2018-12-02 Component 3 Share 50.0
3 2018-12-02 Component 4 Share 50.0
4 2018-12-02 Component 1 Share 15.0
5 2018-12-02 Component 2 Share 15.0
6 2018-12-02 Component 3 Share 15.0
7 2018-12-02 Component 4 Share 15.0
8 2018-12-02 Component 1 Share 150.0
9 2018-12-02 Component 5 Share 150.0