根据条件执行计算而不在 python 中指定列值

To perform Calculations based on condition without specifying column values in python

我有一个 csv 文件,我需要在其中对列执行一些操作而不指定列值。

输入的csv(df)如下:

weather speed   type    cal_A   cal_B
 good   0-3     cold    12       10
 good   0-3     cold    21       7
 good   0-3     cold    31       5
 good   0-3     cold    17       1
 good   3-5     cold    19       17
 bad    0-3     hot     15       4
 bad    6-9     hot     21       13
 bad    6-9     hot     15       7
 bad    6-9     cold    21       4
 rainy  0-3     cold    14       7
 rainy  5-8     cold    21       10
 rainy  5-8     cold    2        3
 rainy  5-8     cold    18       16

在此 csv 中,我需要通过对列天气、类型和速度进行分组来划分名为 cal_A、cal_B 的列,然后找到最小值、最大值和平均值并将它们分开列。

最小值、最大值和平均值是在 cal_A 和 cal_B 列相除后计算的。

输出文件如下:

weather speed   type    cal_A/cal_B(min)        cal_A/cal_B(max)    cal_A/cal_B(mean)
good    0-3     cold    1.2                        17
good    3-5     cold    1.11                       1.11
bad     0-3     hot     3.75                       3.75
bad     6-9     hot     1.61                       2.14
bad     6-9     cold    5.25                       5.25
rainy   0-3     cold    2                          2
rainy   5-8     cold    0.6                        2.1

我试过的代码如下:

df=df.groupby(['weather','speed','type'],as_index=False).min().eval('cal_A/cal_B(min)=cal_A/cal_B') df=df.groupby(['weather','speed','type'],as_index=False).max().eval('cal_A/cal_B(max)=cal_A/cal_B')

上面的代码将天气、速度和类型列分组,然后提供最小值和计算值,但这段代码没有提供预期的输出。

您将首先执行除法,然后对该系列进行分组和聚合。

(df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False).agg(['min', 'max', 'mean'])

如果您想准确地重现您的输出,我们可以使用 add_prefix/suffix 方法(尽管重命名列对象可能更有效)。

((df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False)
   .agg(['min', 'max', 'mean'])
   .add_prefix('cal_A/cal_B(')
   .add_suffix(')')
   .reset_index())

  weather speed  type  cal_A/cal_B(min)  cal_A/cal_B(max)  cal_A/cal_B(mean)
0    good   0-3  cold          1.200000         17.000000           6.850000
1    good   3-5  cold          1.117647          1.117647           1.117647
2     bad   0-3   hot          3.750000          3.750000           3.750000
3     bad   6-9   hot          1.615385          2.142857           1.879121
4     bad   6-9  cold          5.250000          5.250000           5.250000
5   rainy   0-3  cold          2.000000          2.000000           2.000000
6   rainy   5-8  cold          0.666667          2.100000           1.297222

如果使用 pandas 0.25+:

,您可以使用 NamedAgg 来解决问题
import pandas as pd
import numpy as np 
data = {'weather':['good','good','good','good','good','bad','bad','bad','bad','rainy','rainy','rainy','rainy'],'speed':['0-3','0-3','0-3','0-3','3-5','0-3','6-9','6-9','6-9','0-3','5-8','5-8','5-8'],'type':['cold','cold','cold','cold','cold','hot','hot','hot','cold','cold','cold','cold','cold'],'cal_A':[12,21,31,17,19,15,21,15,21,14,21,2,18],'cal_B':[10,7,5,1,17,4,13,7,4,7,10,3,16]}
df = pd.DataFrame(data)
df['divided'] = df['cal_A']/df['cal_B']
output = df.groupby(['weather','speed','type']).agg(
    minimum=pd.NamedAgg(column='divided',aggfunc='min'),
    maximum=pd.NamedAgg(column='divided',aggfunc='max'),
    mean=pd.NamedAgg(column='divided',aggfunc='mean'))
print(output)

输出:

                     minimum    maximum      mean
weather speed type
bad     0-3   hot   3.750000   3.750000  3.750000
        6-9   cold  5.250000   5.250000  5.250000
              hot   1.615385   2.142857  1.879121
good    0-3   cold  1.200000  17.000000  6.850000
        3-5   cold  1.117647   1.117647  1.117647
rainy   0-3   cold  2.000000   2.000000  2.000000
        5-8   cold  0.666667   2.100000  1.297222