Pandas - 修改 groupby.agg 的描述输出
Pandas - modify descriptives output from groupby.agg
我想从任何分数列中获取均值、stp、偏度,同时我将数据集分组为其他 2 个列(组、块)。
我为此使用了这段代码 -
scores_list = ['A','B','C']
descriptive_agg = df.groupby(['group','block'])[scores_list].agg(['mean', 'std','skew'])
并得到了这个数据帧:
A A A B B B C C C
mean std skew mean std skew mean std skew
组块
0 否定 26.76470588 54.79291496 6.069163775 3.098039216 1.170553749 0.114238196 1.738755233 0.611860454 1.063953504
0 neu 29.92 70.9644464 6.275474539 3.6 1.245399698 -0.039619494 1.906404475 0.568964543 0.561075178
1 否定 16.42391304 18.0702133 2.968326848 2.891304348 1.253185144 0.209586627 1.684455875 0.598785419 0.872917578
1 neu 16.92391304 18.49159815 2.951129818 3.5 1.172018077 -0.313988331 1.893045967 0.646930842 1.11778034
但是我想在左边有一个“分数”列,我的预期输出是:
Score group block mean std skew
A 0 neg 26.76470588 54.79291496 6.069163775
0 neu 29.92 70.9644464 6.275474539
1 neg 16.42391304 18.0702133 2.968326848
1 neu 16.92391304 18.49159815 2.951129818
B 0 负数 3.098039216 1.170553749 0.114238196
0 新 3.6 1.245399698 -0.039619494
1 否定 2.891304348 1.253185144 0.209586627
1 neu 3.5 1.172018077 -0.313988331
提前致谢!
添加DataFrame.stack
with DataFrame.reorder_levels
and DataFrame.sort_index
:
df = df.stack(0).reorder_levels([2,0,1]).sort_index()
print (df)
mean skew std
A 0 neg 26.764706 6.069164 54.792915
neu 29.920000 6.275475 70.964446
1 neg 16.423913 2.968327 18.070213
neu 16.923913 2.951130 18.491598
B 0 neg 3.098039 0.114238 1.170554
neu 3.600000 -0.039619 1.245400
1 neg 2.891304 0.209587 1.253185
neu 3.500000 -0.313988 1.172018
C 0 neg 1.738755 1.063954 0.611860
neu 1.906404 0.561075 0.568965
1 neg 1.684456 0.872918 0.598785
neu 1.893046 1.117780 0.646931
编辑:如果需要将重复值替换为空字符串:
#original index
print (df.index)
MultiIndex([('A', 0, 'neg'),
('A', 0, 'neu'),
('A', 1, 'neg'),
('A', 1, 'neu'),
('B', 0, 'neg'),
('B', 0, 'neu'),
('B', 1, 'neg'),
('B', 1, 'neu'),
('C', 0, 'neg'),
('C', 0, 'neu'),
('C', 1, 'neg'),
('C', 1, 'neu')],
)
df1 = df.index.to_frame(index=False)
df1.columns = [0,1,2]
m1 = df1[0].duplicated()
m2 = df1.duplicated(subset=[0,1])
df1[0] = df1[0].mask(m1, '')
df1[1] = df1[1].mask(m2, '')
print (df1)
0 1 2
0 A 0 neg
1 neu
2 1 neg
3 neu
4 B 0 neg
5 neu
6 1 neg
7 neu
8 C 0 neg
9 neu
10 1 neg
11 neu
df.index = pd.MultiIndex.from_frame(df1)
df = df.rename_axis([None, None, None])
print (df)
mean skew std
A 0 neg 26.764706 6.069164 54.792915
neu 29.920000 6.275475 70.964446
1 neg 16.423913 2.968327 18.070213
neu 16.923913 2.951130 18.491598
B 0 neg 3.098039 0.114238 1.170554
neu 3.600000 -0.039619 1.245400
1 neg 2.891304 0.209587 1.253185
neu 3.500000 -0.313988 1.172018
C 0 neg 1.738755 1.063954 0.611860
neu 1.906404 0.561075 0.568965
1 neg 1.684456 0.872918 0.598785
neu 1.893046 1.117780 0.646931
print (df.index)
MultiIndex([('A', 0, 'neg'),
( '', '', 'neu'),
( '', 1, 'neg'),
( '', '', 'neu'),
('B', 0, 'neg'),
( '', '', 'neu'),
( '', 1, 'neg'),
( '', '', 'neu'),
('C', 0, 'neg'),
( '', '', 'neu'),
( '', 1, 'neg'),
( '', '', 'neu')],
)
我想从任何分数列中获取均值、stp、偏度,同时我将数据集分组为其他 2 个列(组、块)。 我为此使用了这段代码 -
scores_list = ['A','B','C']
descriptive_agg = df.groupby(['group','block'])[scores_list].agg(['mean', 'std','skew'])
并得到了这个数据帧:
A A A B B B C C C
mean std skew mean std skew mean std skew
组块
0 否定 26.76470588 54.79291496 6.069163775 3.098039216 1.170553749 0.114238196 1.738755233 0.611860454 1.063953504
0 neu 29.92 70.9644464 6.275474539 3.6 1.245399698 -0.039619494 1.906404475 0.568964543 0.561075178
1 否定 16.42391304 18.0702133 2.968326848 2.891304348 1.253185144 0.209586627 1.684455875 0.598785419 0.872917578
1 neu 16.92391304 18.49159815 2.951129818 3.5 1.172018077 -0.313988331 1.893045967 0.646930842 1.11778034
但是我想在左边有一个“分数”列,我的预期输出是:
Score group block mean std skew A 0 neg 26.76470588 54.79291496 6.069163775 0 neu 29.92 70.9644464 6.275474539 1 neg 16.42391304 18.0702133 2.968326848 1 neu 16.92391304 18.49159815 2.951129818
B 0 负数 3.098039216 1.170553749 0.114238196 0 新 3.6 1.245399698 -0.039619494 1 否定 2.891304348 1.253185144 0.209586627 1 neu 3.5 1.172018077 -0.313988331
提前致谢!
添加DataFrame.stack
with DataFrame.reorder_levels
and DataFrame.sort_index
:
df = df.stack(0).reorder_levels([2,0,1]).sort_index()
print (df)
mean skew std
A 0 neg 26.764706 6.069164 54.792915
neu 29.920000 6.275475 70.964446
1 neg 16.423913 2.968327 18.070213
neu 16.923913 2.951130 18.491598
B 0 neg 3.098039 0.114238 1.170554
neu 3.600000 -0.039619 1.245400
1 neg 2.891304 0.209587 1.253185
neu 3.500000 -0.313988 1.172018
C 0 neg 1.738755 1.063954 0.611860
neu 1.906404 0.561075 0.568965
1 neg 1.684456 0.872918 0.598785
neu 1.893046 1.117780 0.646931
编辑:如果需要将重复值替换为空字符串:
#original index
print (df.index)
MultiIndex([('A', 0, 'neg'),
('A', 0, 'neu'),
('A', 1, 'neg'),
('A', 1, 'neu'),
('B', 0, 'neg'),
('B', 0, 'neu'),
('B', 1, 'neg'),
('B', 1, 'neu'),
('C', 0, 'neg'),
('C', 0, 'neu'),
('C', 1, 'neg'),
('C', 1, 'neu')],
)
df1 = df.index.to_frame(index=False)
df1.columns = [0,1,2]
m1 = df1[0].duplicated()
m2 = df1.duplicated(subset=[0,1])
df1[0] = df1[0].mask(m1, '')
df1[1] = df1[1].mask(m2, '')
print (df1)
0 1 2
0 A 0 neg
1 neu
2 1 neg
3 neu
4 B 0 neg
5 neu
6 1 neg
7 neu
8 C 0 neg
9 neu
10 1 neg
11 neu
df.index = pd.MultiIndex.from_frame(df1)
df = df.rename_axis([None, None, None])
print (df)
mean skew std
A 0 neg 26.764706 6.069164 54.792915
neu 29.920000 6.275475 70.964446
1 neg 16.423913 2.968327 18.070213
neu 16.923913 2.951130 18.491598
B 0 neg 3.098039 0.114238 1.170554
neu 3.600000 -0.039619 1.245400
1 neg 2.891304 0.209587 1.253185
neu 3.500000 -0.313988 1.172018
C 0 neg 1.738755 1.063954 0.611860
neu 1.906404 0.561075 0.568965
1 neg 1.684456 0.872918 0.598785
neu 1.893046 1.117780 0.646931
print (df.index)
MultiIndex([('A', 0, 'neg'),
( '', '', 'neu'),
( '', 1, 'neg'),
( '', '', 'neu'),
('B', 0, 'neg'),
( '', '', 'neu'),
( '', 1, 'neg'),
( '', '', 'neu'),
('C', 0, 'neg'),
( '', '', 'neu'),
( '', 1, 'neg'),
( '', '', 'neu')],
)