如何对具有多个索引的熊猫数据框进行计算?
How to make calculations on pnadas dataframe with multiple index?
假设我有一个多索引 DataFrame:
Frequency
occupation gender
administrator F 36
M 43
artist F 13
M 15
doctor M 7
educator F 26
M 69
engineer F 2
M 65
其中前两列是索引。
如何添加另一列给出 F 和 M 之间的比率?
使用 Series.unstack
进行整形,因此对于比率除列:
df1 = df['Frequency'].unstack()
df1['ratio'] = df1['F'].div(df1['M'])
print (df1)
F M ratio
administrator 36.0 43.0 0.837209
artist 13.0 15.0 0.866667
doctor NaN 7.0 NaN
educator 26.0 69.0 0.376812
engineer 2.0 65.0 0.030769
entertainment 2.0 16.0 0.125000
如果需要新列:
s = df['Frequency'].xs('F', level=1).div(df['Frequency'].xs('M', level=1))
df['ratio'] = df.index.droplevel(1).map(s)
print (df)
Frequency ratio
administrator F 36 0.837209
M 43 0.837209
artist F 13 0.866667
M 15 0.866667
doctor M 7 NaN
educator F 26 0.376812
M 69 0.376812
engineer F 2 0.030769
M 65 0.030769
entertainment F 2 0.125000
M 16 0.125000
假设我有一个多索引 DataFrame:
Frequency
occupation gender
administrator F 36
M 43
artist F 13
M 15
doctor M 7
educator F 26
M 69
engineer F 2
M 65
其中前两列是索引。 如何添加另一列给出 F 和 M 之间的比率?
使用 Series.unstack
进行整形,因此对于比率除列:
df1 = df['Frequency'].unstack()
df1['ratio'] = df1['F'].div(df1['M'])
print (df1)
F M ratio
administrator 36.0 43.0 0.837209
artist 13.0 15.0 0.866667
doctor NaN 7.0 NaN
educator 26.0 69.0 0.376812
engineer 2.0 65.0 0.030769
entertainment 2.0 16.0 0.125000
如果需要新列:
s = df['Frequency'].xs('F', level=1).div(df['Frequency'].xs('M', level=1))
df['ratio'] = df.index.droplevel(1).map(s)
print (df)
Frequency ratio
administrator F 36 0.837209
M 43 0.837209
artist F 13 0.866667
M 15 0.866667
doctor M 7 NaN
educator F 26 0.376812
M 69 0.376812
engineer F 2 0.030769
M 65 0.030769
entertainment F 2 0.125000
M 16 0.125000