在给定另一列条件的情况下查找数据框中一列的平均值

Question

假设我有上面的数据框，我想写一个函数

    def ave(pd,minx,maxx):

它计算 minx 和 maxx 之间的各个 x 值的 y 值的平均值，即在以下示例中：

    ave(file, 2, 3) #where file is wherever I import these x and y values from

它会 return 3.3857...

我尝试了以下方法：

def ave(pd,minx,maxx):
x = list(data.iloc[:, 0].values)
y = list(data.iloc[:, 1].values)
lst=[]
for i in x:
    if x[i]>xmin and x[i]<xmax:
        lst+=y[i]
return (sum(lst)/len(list))

但这给出了错误：列表索引必须是整数或切片，而不是 numpy.float64

Answer 1

为什么不只是满足这些条件的 select 行？在使用数据帧时，你真的应该尽可能避免循环。

def y_average(df, min_x, max_x):
    return df[(df["x"] > min_x) & (df["x"] < max_x)]["y"].mean()

用法：

In [3]: avg(df, 2, 3)
Out[3]: 3.3857142857142857

在给定另一列条件的情况下查找数据框中一列的平均值

Find average of a column in a dataframe given conditions on another column

python

average

dataframe