根据条件执行计算而不在 python 中指定列值
To perform Calculations based on condition without specifying column values in python
我有一个 csv 文件,我需要在其中对列执行一些操作而不指定列值。
输入的csv(df)如下:
weather speed type cal_A cal_B
good 0-3 cold 12 10
good 0-3 cold 21 7
good 0-3 cold 31 5
good 0-3 cold 17 1
good 3-5 cold 19 17
bad 0-3 hot 15 4
bad 6-9 hot 21 13
bad 6-9 hot 15 7
bad 6-9 cold 21 4
rainy 0-3 cold 14 7
rainy 5-8 cold 21 10
rainy 5-8 cold 2 3
rainy 5-8 cold 18 16
在此 csv 中,我需要通过对列天气、类型和速度进行分组来划分名为 cal_A、cal_B 的列,然后找到最小值、最大值和平均值并将它们分开列。
最小值、最大值和平均值是在 cal_A 和 cal_B 列相除后计算的。
输出文件如下:
weather speed type cal_A/cal_B(min) cal_A/cal_B(max) cal_A/cal_B(mean)
good 0-3 cold 1.2 17
good 3-5 cold 1.11 1.11
bad 0-3 hot 3.75 3.75
bad 6-9 hot 1.61 2.14
bad 6-9 cold 5.25 5.25
rainy 0-3 cold 2 2
rainy 5-8 cold 0.6 2.1
我试过的代码如下:
df=df.groupby(['weather','speed','type'],as_index=False).min().eval('cal_A/cal_B(min)=cal_A/cal_B')
df=df.groupby(['weather','speed','type'],as_index=False).max().eval('cal_A/cal_B(max)=cal_A/cal_B')
上面的代码将天气、速度和类型列分组,然后提供最小值和计算值,但这段代码没有提供预期的输出。
您将首先执行除法,然后对该系列进行分组和聚合。
(df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False).agg(['min', 'max', 'mean'])
如果您想准确地重现您的输出,我们可以使用 add_prefix/suffix
方法(尽管重命名列对象可能更有效)。
((df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False)
.agg(['min', 'max', 'mean'])
.add_prefix('cal_A/cal_B(')
.add_suffix(')')
.reset_index())
weather speed type cal_A/cal_B(min) cal_A/cal_B(max) cal_A/cal_B(mean)
0 good 0-3 cold 1.200000 17.000000 6.850000
1 good 3-5 cold 1.117647 1.117647 1.117647
2 bad 0-3 hot 3.750000 3.750000 3.750000
3 bad 6-9 hot 1.615385 2.142857 1.879121
4 bad 6-9 cold 5.250000 5.250000 5.250000
5 rainy 0-3 cold 2.000000 2.000000 2.000000
6 rainy 5-8 cold 0.666667 2.100000 1.297222
如果使用 pandas 0.25+:
,您可以使用 NamedAgg
来解决问题
import pandas as pd
import numpy as np
data = {'weather':['good','good','good','good','good','bad','bad','bad','bad','rainy','rainy','rainy','rainy'],'speed':['0-3','0-3','0-3','0-3','3-5','0-3','6-9','6-9','6-9','0-3','5-8','5-8','5-8'],'type':['cold','cold','cold','cold','cold','hot','hot','hot','cold','cold','cold','cold','cold'],'cal_A':[12,21,31,17,19,15,21,15,21,14,21,2,18],'cal_B':[10,7,5,1,17,4,13,7,4,7,10,3,16]}
df = pd.DataFrame(data)
df['divided'] = df['cal_A']/df['cal_B']
output = df.groupby(['weather','speed','type']).agg(
minimum=pd.NamedAgg(column='divided',aggfunc='min'),
maximum=pd.NamedAgg(column='divided',aggfunc='max'),
mean=pd.NamedAgg(column='divided',aggfunc='mean'))
print(output)
输出:
minimum maximum mean
weather speed type
bad 0-3 hot 3.750000 3.750000 3.750000
6-9 cold 5.250000 5.250000 5.250000
hot 1.615385 2.142857 1.879121
good 0-3 cold 1.200000 17.000000 6.850000
3-5 cold 1.117647 1.117647 1.117647
rainy 0-3 cold 2.000000 2.000000 2.000000
5-8 cold 0.666667 2.100000 1.297222
我有一个 csv 文件,我需要在其中对列执行一些操作而不指定列值。
输入的csv(df)如下:
weather speed type cal_A cal_B
good 0-3 cold 12 10
good 0-3 cold 21 7
good 0-3 cold 31 5
good 0-3 cold 17 1
good 3-5 cold 19 17
bad 0-3 hot 15 4
bad 6-9 hot 21 13
bad 6-9 hot 15 7
bad 6-9 cold 21 4
rainy 0-3 cold 14 7
rainy 5-8 cold 21 10
rainy 5-8 cold 2 3
rainy 5-8 cold 18 16
在此 csv 中,我需要通过对列天气、类型和速度进行分组来划分名为 cal_A、cal_B 的列,然后找到最小值、最大值和平均值并将它们分开列。
最小值、最大值和平均值是在 cal_A 和 cal_B 列相除后计算的。
输出文件如下:
weather speed type cal_A/cal_B(min) cal_A/cal_B(max) cal_A/cal_B(mean)
good 0-3 cold 1.2 17
good 3-5 cold 1.11 1.11
bad 0-3 hot 3.75 3.75
bad 6-9 hot 1.61 2.14
bad 6-9 cold 5.25 5.25
rainy 0-3 cold 2 2
rainy 5-8 cold 0.6 2.1
我试过的代码如下:
df=df.groupby(['weather','speed','type'],as_index=False).min().eval('cal_A/cal_B(min)=cal_A/cal_B')
df=df.groupby(['weather','speed','type'],as_index=False).max().eval('cal_A/cal_B(max)=cal_A/cal_B')
上面的代码将天气、速度和类型列分组,然后提供最小值和计算值,但这段代码没有提供预期的输出。
您将首先执行除法,然后对该系列进行分组和聚合。
(df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False).agg(['min', 'max', 'mean'])
如果您想准确地重现您的输出,我们可以使用 add_prefix/suffix
方法(尽管重命名列对象可能更有效)。
((df.cal_A/df.cal_B).groupby([df.weather, df.speed, df.type], sort=False)
.agg(['min', 'max', 'mean'])
.add_prefix('cal_A/cal_B(')
.add_suffix(')')
.reset_index())
weather speed type cal_A/cal_B(min) cal_A/cal_B(max) cal_A/cal_B(mean)
0 good 0-3 cold 1.200000 17.000000 6.850000
1 good 3-5 cold 1.117647 1.117647 1.117647
2 bad 0-3 hot 3.750000 3.750000 3.750000
3 bad 6-9 hot 1.615385 2.142857 1.879121
4 bad 6-9 cold 5.250000 5.250000 5.250000
5 rainy 0-3 cold 2.000000 2.000000 2.000000
6 rainy 5-8 cold 0.666667 2.100000 1.297222
如果使用 pandas 0.25+:
,您可以使用NamedAgg
来解决问题
import pandas as pd
import numpy as np
data = {'weather':['good','good','good','good','good','bad','bad','bad','bad','rainy','rainy','rainy','rainy'],'speed':['0-3','0-3','0-3','0-3','3-5','0-3','6-9','6-9','6-9','0-3','5-8','5-8','5-8'],'type':['cold','cold','cold','cold','cold','hot','hot','hot','cold','cold','cold','cold','cold'],'cal_A':[12,21,31,17,19,15,21,15,21,14,21,2,18],'cal_B':[10,7,5,1,17,4,13,7,4,7,10,3,16]}
df = pd.DataFrame(data)
df['divided'] = df['cal_A']/df['cal_B']
output = df.groupby(['weather','speed','type']).agg(
minimum=pd.NamedAgg(column='divided',aggfunc='min'),
maximum=pd.NamedAgg(column='divided',aggfunc='max'),
mean=pd.NamedAgg(column='divided',aggfunc='mean'))
print(output)
输出:
minimum maximum mean
weather speed type
bad 0-3 hot 3.750000 3.750000 3.750000
6-9 cold 5.250000 5.250000 5.250000
hot 1.615385 2.142857 1.879121
good 0-3 cold 1.200000 17.000000 6.850000
3-5 cold 1.117647 1.117647 1.117647
rainy 0-3 cold 2.000000 2.000000 2.000000
5-8 cold 0.666667 2.100000 1.297222