python 对数据框进行分组时跨多个列获取最大值和最小值
python get max and min values across mutiple columns while grouping a dataframe
此查询与How to get the max value of a multiple column group-by pandas? and well as
相关
我正在尝试从分组数据中的两列创建最小值和最大值
我有一个这种形状的数据集
measure measure_group route year actual budget
AC electrification A 20182019 103 99
AC electrification A 20192020 110 122
AC electrification B 20182019 9 10
AC electrification B 20192020 55 50
HV electrification A 20182019 2 10
HV electrification A 20192020 7 15
HV electrification B 20182019 67 10
HV electrification B 20192020 100 115
cat 1 track A 20182019 10 15
cat 1 track A 20192020 111 25
cat 1 track B 20182019 55 16
cat 1 track B 20192020 75 175
cat 2 track A 20182019 84 5
cat 2 track A 20192020 125 1005
cat 2 track B 20182019 7 4
cat 2 track B 20192020 15 25
我想要的是作为新列的 [实际,预算] 的每个度量组合的最小值和最大值,measure_group,路线,类似这样的东西
measure measure_group route year actual budget min max
AC electrification A 20182019 103 99 99 122
AC electrification A 20192020 110 122 99 122
AC electrification B 20182019 9 10 9 55
AC electrification B 20192020 55 50 9 55
HV electrification A 20182019 2 10 2 15
HV electrification A 20192020 7 15 2 15
HV electrification B 20182019 67 10 10 115
HV electrification B 20192020 100 115 10 115
cat 1 track A 20182019 10 15 10 111
cat 1 track A 20192020 111 25 10 111
cat 1 track B 20182019 55 16 16 175
cat 1 track B 20192020 75 175 16 175
cat 2 track A 20182019 84 5 5 1005
cat 2 track A 20192020 125 1005 5 1005
cat 2 track B 20182019 7 4 4 25
cat 2 track B 20192020 15 25 4 25
我尝试了 df.groupby df_remapped['min'] = df_remapped.groupby(['Measure','measure_group','route'])[['Actual','Budget']].transform('min')
的各种组合,但是这个 return 值错误:Wrong number of items passed 2, placement implies 1
我有一种感觉,我正在尝试 return 两列合并为一列。
我确实考虑过生成一个独立的数据框,然后在公共索引上使用 join 重新加入原始数据框,但这感觉像是一个冗长的解决方法....
任何指向可能方法的指示都将不胜感激。奇怪的是,大多数聚合示例仅针对单列。
您可以 melt
DataFrame,以便在计算最小值或最大值时考虑 'actual' 或 'budget'。然后将熔化的DataFrame分组并合并回来。
id_vars = ['measure', 'measure_group', 'route']
df1 = (df.melt(id_vars=id_vars, value_vars=['actual', 'budget'])
.groupby(id_vars)['value']
.agg(['min', 'max']))
df = df.merge(df1, how='left', on=id_vars)
measure measure_group route year actual budget min max
0 AC electrification A 20182019 103 99 99 122
1 AC electrification A 20192020 110 122 99 122
2 AC electrification B 20182019 9 10 9 55
3 AC electrification B 20192020 55 50 9 55
4 HV electrification A 20182019 2 10 2 15
5 HV electrification A 20192020 7 15 2 15
6 HV electrification B 20182019 67 10 10 115
7 HV electrification B 20192020 100 115 10 115
8 cat1 track A 20182019 10 15 10 111
9 cat1 track A 20192020 111 25 10 111
10 cat1 track B 20182019 55 16 16 175
11 cat1 track B 20192020 75 175 16 175
12 cat2 track A 20182019 84 5 5 1005
13 cat2 track A 20192020 125 1005 5 1005
14 cat2 track B 20182019 7 4 4 25
15 cat2 track B 20192020 15 25 4 25
此查询与How to get the max value of a multiple column group-by pandas? and well as
我正在尝试从分组数据中的两列创建最小值和最大值
我有一个这种形状的数据集
measure measure_group route year actual budget
AC electrification A 20182019 103 99
AC electrification A 20192020 110 122
AC electrification B 20182019 9 10
AC electrification B 20192020 55 50
HV electrification A 20182019 2 10
HV electrification A 20192020 7 15
HV electrification B 20182019 67 10
HV electrification B 20192020 100 115
cat 1 track A 20182019 10 15
cat 1 track A 20192020 111 25
cat 1 track B 20182019 55 16
cat 1 track B 20192020 75 175
cat 2 track A 20182019 84 5
cat 2 track A 20192020 125 1005
cat 2 track B 20182019 7 4
cat 2 track B 20192020 15 25
我想要的是作为新列的 [实际,预算] 的每个度量组合的最小值和最大值,measure_group,路线,类似这样的东西
measure measure_group route year actual budget min max
AC electrification A 20182019 103 99 99 122
AC electrification A 20192020 110 122 99 122
AC electrification B 20182019 9 10 9 55
AC electrification B 20192020 55 50 9 55
HV electrification A 20182019 2 10 2 15
HV electrification A 20192020 7 15 2 15
HV electrification B 20182019 67 10 10 115
HV electrification B 20192020 100 115 10 115
cat 1 track A 20182019 10 15 10 111
cat 1 track A 20192020 111 25 10 111
cat 1 track B 20182019 55 16 16 175
cat 1 track B 20192020 75 175 16 175
cat 2 track A 20182019 84 5 5 1005
cat 2 track A 20192020 125 1005 5 1005
cat 2 track B 20182019 7 4 4 25
cat 2 track B 20192020 15 25 4 25
我尝试了 df.groupby df_remapped['min'] = df_remapped.groupby(['Measure','measure_group','route'])[['Actual','Budget']].transform('min')
的各种组合,但是这个 return 值错误:Wrong number of items passed 2, placement implies 1
我有一种感觉,我正在尝试 return 两列合并为一列。
我确实考虑过生成一个独立的数据框,然后在公共索引上使用 join 重新加入原始数据框,但这感觉像是一个冗长的解决方法....
任何指向可能方法的指示都将不胜感激。奇怪的是,大多数聚合示例仅针对单列。
您可以 melt
DataFrame,以便在计算最小值或最大值时考虑 'actual' 或 'budget'。然后将熔化的DataFrame分组并合并回来。
id_vars = ['measure', 'measure_group', 'route']
df1 = (df.melt(id_vars=id_vars, value_vars=['actual', 'budget'])
.groupby(id_vars)['value']
.agg(['min', 'max']))
df = df.merge(df1, how='left', on=id_vars)
measure measure_group route year actual budget min max
0 AC electrification A 20182019 103 99 99 122
1 AC electrification A 20192020 110 122 99 122
2 AC electrification B 20182019 9 10 9 55
3 AC electrification B 20192020 55 50 9 55
4 HV electrification A 20182019 2 10 2 15
5 HV electrification A 20192020 7 15 2 15
6 HV electrification B 20182019 67 10 10 115
7 HV electrification B 20192020 100 115 10 115
8 cat1 track A 20182019 10 15 10 111
9 cat1 track A 20192020 111 25 10 111
10 cat1 track B 20182019 55 16 16 175
11 cat1 track B 20192020 75 175 16 175
12 cat2 track A 20182019 84 5 5 1005
13 cat2 track A 20192020 125 1005 5 1005
14 cat2 track B 20182019 7 4 4 25
15 cat2 track B 20192020 15 25 4 25