如何使用 apply() 添加具有标准偏差条件的新列？

Question

我有一个函数，以及一个要应用于 df 的规则。

def apply_rule(df, rule):
    df['legal'] = df.apply(rule)

def greater_than_mean_plus_1_std():
    return df['col1']>df['col1'].mean()+df['col1'].std()

apply_rule(df, greater_than_mean_plus_1_std)

我想在我的 df 上应用规则，它可以让我成为一个新列，告诉我行的值是否大于 mean+std。

但是对于 df.apply()，我不能在这里使用 df.mean() 和 df.std()。

AttributeError: 'float' object has no attribute 'mean'

有办法吗？或者我必须使用 df.apply()?

以外的方法

已编辑：

print(df.head())

   col1
0   7.2
1   7.2
2   7.2
3   7.2
4   7.2

预期输出：

   col1  legal
0   7.2  False
1   7.2  False
2   7.2  False
3   7.2  False
4   7.2  False

Answer 1

这里不用apply

df['legal'] = df['col1'] > (df['col1'].mean()+df['col1'].std())

如果要使用 apply，可以在行上使用 DataFrame.apply 或 Series.apply

df['legal'] = df.apply(lambda row: row['col1'] > (df['col1'].mean()+df['col1'].std()), axis=1)
# or
df['legal'] = df['col1'].apply(lambda x: x > (df['col1'].mean()+df['col1'].std()))

Answer 2

先计算均值和标准值，

col1_mean = df["col1"].mean()
col1_std = df["col1"].std()

然后像这样在应用中使用这些 pre-calculated 值

df["legal"] = df["col1"].apply(lamdba x: x > col1_mean + col1_std)

如果你想让它发挥作用，你可以使用 lambda:

col1_mean = df["col1"].mean()
col1_std = df["col1"].std()
greater_than_mean_plus_1_std = lambda x: x > col1_mean + col1_std

def apply_rule(df, rule, column):
    df['legal'] = df[column].apply(rule)

现在调用这个apply_rule

apply_rule(df, greater_than_mean_plus_1_std, "col1")

Answer 3

您可以使用：

def apply_rule(df, rule):
    df['legal'] = rule(df)  # <- change here

def greater_than_mean_plus_1_std(df):  # <- change here
    return df['col1'] > df['col1'].mean() + df['col1'].std()

apply_rule(df, greater_than_mean_plus_1_std)

输出：

# df = pd.DataFrame({'col1': range(10)})
>>> df
   col1  legal
0     0  False
1     1  False
2     2  False
3     3  False
4     4  False
5     5  False
6     6  False
7     7  False
8     8   True
9     9   True

如何使用 apply() 添加具有标准偏差条件的新列？

How to use apply() to add new columns with condition of standard deviation?

python

apply

pandas