重新索引缺少类别的多级索引
reindex multi level index with missing categories
我有一个包含两个索引的数据框,group 和 class。我有一本字典,其中包含需要添加到这两个索引中的其他级别。具体来说,我想将 E 添加到 group 索引中。我想确保所有 g1、g2 和 g3 都存在于 class 索引中,每个 group(因此将 g3 添加到A组,g1到B组,g2和g3到C组,g1和g3到D组,g1,g2和g3到E组。我想在适当的地方用零填充总列
原始数据框在这里
df = pd.DataFrame(data={'group' : ['A','A','B','B','C','D'],
'class': ['g1','g2','g2','g3','g1','g2'],
'total' : [3,14,12,11,21,9]})
包含所有必需类别的字典(和映射的 df)在这里
dic = {'group':['A','B','C','D','E'],
'class' : ['g1','g2','g3']}
预期的输出在这里
expectedOutput = pd.DataFrame(data={'group' : ['A','A','A','B','B','B','C','C','C','D','D','D','E','E','E'],
'class': ['g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3'],
'total' : [3,14,0, 0,12,11,21,0,0,0,9,0, 0,0,0]})
我在重建索引时无法维护重复的值,但我需要保留所有这些值。欢迎任何建议,非常感谢
MultiIndex
的解决方案 - MultiIndex.from_product
with DataFrame.reindex
从 dict
创建:
dic = {'group':['A','B','C','D','E'],
'class' : ['g1','g2','g3']}
mux = pd.MultiIndex.from_product(dic.values(), names=dic)
df = df.set_index(list(dic)).reindex(mux, fill_value=0).reset_index()
print (df)
group class total
0 A g1 3
1 A g2 14
2 A g3 0
3 B g1 0
4 B g2 12
5 B g3 11
6 C g1 21
7 C g2 0
8 C g3 0
9 D g1 0
10 D g2 9
11 D g3 0
12 E g1 0
13 E g2 0
14 E g3 0
或左连接 DataFrame
由 itertools.product
创建:
from itertools import product
dicDf = pd.DataFrame(product(*dic.values()), columns=dic)
df = dicDf.merge(df, how='left').fillna({'total':0})
print (df)
group class total
0 A g1 3.0
1 A g2 14.0
2 A g3 0.0
3 B g1 0.0
4 B g2 12.0
5 B g3 11.0
6 C g1 21.0
7 C g2 0.0
8 C g3 0.0
9 D g1 0.0
10 D g2 9.0
11 D g3 0.0
12 E g1 0.0
13 E g2 0.0
14 E g3 0.0
您可以使用不错的 pyjanitor module and its complete
方法:
# pip install pyjanitor
import janitor as jn
(df.complete({'group': list(df['group'].unique())+['D', 'E']}, 'class')
.fillna(0, downcast='infer')
)
输出:
group class total
0 A g1 3
1 A g2 14
2 A g3 0
3 B g1 0
4 B g2 12
5 B g3 11
6 C g1 21
7 C g2 0
8 C g3 0
9 D g1 0
10 D g2 9
11 D g3 0
12 E g1 0
13 E g2 0
14 E g3 0
我有一个包含两个索引的数据框,group 和 class。我有一本字典,其中包含需要添加到这两个索引中的其他级别。具体来说,我想将 E 添加到 group 索引中。我想确保所有 g1、g2 和 g3 都存在于 class 索引中,每个 group(因此将 g3 添加到A组,g1到B组,g2和g3到C组,g1和g3到D组,g1,g2和g3到E组。我想在适当的地方用零填充总列
原始数据框在这里
df = pd.DataFrame(data={'group' : ['A','A','B','B','C','D'],
'class': ['g1','g2','g2','g3','g1','g2'],
'total' : [3,14,12,11,21,9]})
包含所有必需类别的字典(和映射的 df)在这里
dic = {'group':['A','B','C','D','E'],
'class' : ['g1','g2','g3']}
预期的输出在这里
expectedOutput = pd.DataFrame(data={'group' : ['A','A','A','B','B','B','C','C','C','D','D','D','E','E','E'],
'class': ['g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3','g1','g2', 'g3'],
'total' : [3,14,0, 0,12,11,21,0,0,0,9,0, 0,0,0]})
我在重建索引时无法维护重复的值,但我需要保留所有这些值。欢迎任何建议,非常感谢
MultiIndex
的解决方案 - MultiIndex.from_product
with DataFrame.reindex
从 dict
创建:
dic = {'group':['A','B','C','D','E'],
'class' : ['g1','g2','g3']}
mux = pd.MultiIndex.from_product(dic.values(), names=dic)
df = df.set_index(list(dic)).reindex(mux, fill_value=0).reset_index()
print (df)
group class total
0 A g1 3
1 A g2 14
2 A g3 0
3 B g1 0
4 B g2 12
5 B g3 11
6 C g1 21
7 C g2 0
8 C g3 0
9 D g1 0
10 D g2 9
11 D g3 0
12 E g1 0
13 E g2 0
14 E g3 0
或左连接 DataFrame
由 itertools.product
创建:
from itertools import product
dicDf = pd.DataFrame(product(*dic.values()), columns=dic)
df = dicDf.merge(df, how='left').fillna({'total':0})
print (df)
group class total
0 A g1 3.0
1 A g2 14.0
2 A g3 0.0
3 B g1 0.0
4 B g2 12.0
5 B g3 11.0
6 C g1 21.0
7 C g2 0.0
8 C g3 0.0
9 D g1 0.0
10 D g2 9.0
11 D g3 0.0
12 E g1 0.0
13 E g2 0.0
14 E g3 0.0
您可以使用不错的 pyjanitor module and its complete
方法:
# pip install pyjanitor
import janitor as jn
(df.complete({'group': list(df['group'].unique())+['D', 'E']}, 'class')
.fillna(0, downcast='infer')
)
输出:
group class total
0 A g1 3
1 A g2 14
2 A g3 0
3 B g1 0
4 B g2 12
5 B g3 11
6 C g1 21
7 C g2 0
8 C g3 0
9 D g1 0
10 D g2 9
11 D g3 0
12 E g1 0
13 E g2 0
14 E g3 0