扩展 Pandas 基于 DataFrame 的嵌套字典值

Expanding Pandas DataFrame based Nested Dictionary Values

简而言之,我希望根据我创建的映射架构将组级别视图扩展到这些组的各个组件中。

我有两组数据。我在 df 中有交易数据,在 nested 中有用于映射的嵌套字典设置。

import pandas as pd
nested = {"Group A":{"Component 1 Share": 0.25, "Component 2 Share": 0.25, "Component 3 Share": 0.25, "Component 4 Share": 0.25}, 
      "Group B":{"Component 1 Share": 0.5, "Component 5 Share": 0.5}}
data = {'date': ['2018-12-01', '2018-12-01', '2018-12-02', '2018-12-02', '2018-12-02'],
    'groups':['Group A', 'Group B', 'Group A', 'Group B', 'Group A'],
    'sold': [100, 200, 200, 300, 60]}
df = pd.DataFrame(data, columns = ['date', 'groups','sold'])

我的目标是使用 nested 字典在组件级别将其转换为这种格式。我简化了两种数据结构,其中真正的 df 更大,而真正的 nested 字典有更多不同长度的元素。

goal_data = {'date': ['2018-12-01', '2018-12-01', '2018-12-01', '2018-12-01', 
                  '2018-12-01', '2018-12-01', 
                  '2018-12-02', '2018-12-02', '2018-12-02', '2018-12-02',
                  '2018-12-02', '2018-12-02', 
                  '2018-12-02', '2018-12-02', '2018-12-02', '2018-12-02'],
    'components':["Component 1 Share", "Component 2 Share", "Component 3 Share", "Component 4 Share",
                  "Component 1 Share", "Component 5 Share",
                  "Component 1 Share", "Component 2 Share", "Component 3 Share", "Component 4 Share",
                  "Component 1 Share", "Component 5 Share", 
                  "Component 1 Share", "Component 2 Share", "Component 3 Share", "Component 4 Share"],
    'sold': [25, 25, 25, 25,
             100, 100,
             50, 50, 50, 50, 
             150, 150, 
             15,15,15,15]}
component_df = pd.DataFrame(goal_data, columns=["date", "components", "sold"])

我尝试过各种方法,例如 mapapplylookup、和 merge,但没有成功,但直觉上知道有一种方法可以扩展组级别数据到组件中。

你可以从 nested dict 开始,然后 merge 每组

nestdict_f=pd.DataFrame(nested).stack().reset_index()

newdf=pd.concat([y.merge(nestdict_f,left_on='groups',right_on='level_1')for _,y in df.groupby('date')])    

newdf['sold']=newdf['sold']*newdf[0]

newdf=newdf[['date','level_0','sold']].rename(columns={'level_0':'components'})

newdf
         date         components   sold
0  2018-12-01  Component 1 Share   25.0
1  2018-12-01  Component 2 Share   25.0
2  2018-12-01  Component 3 Share   25.0
3  2018-12-01  Component 4 Share   25.0
4  2018-12-01  Component 1 Share  100.0
5  2018-12-01  Component 5 Share  100.0
0  2018-12-02  Component 1 Share   50.0
1  2018-12-02  Component 2 Share   50.0
2  2018-12-02  Component 3 Share   50.0
3  2018-12-02  Component 4 Share   50.0
4  2018-12-02  Component 1 Share   15.0
5  2018-12-02  Component 2 Share   15.0
6  2018-12-02  Component 3 Share   15.0
7  2018-12-02  Component 4 Share   15.0
8  2018-12-02  Component 1 Share  150.0
9  2018-12-02  Component 5 Share  150.0