连接 Pandas 管道并应用于行

Question

我有以下代码：

df = (
            df
            .pipe(function_1)
            .pipe(function_2)
        )

# Apply the policy
df["prediction"] = df.apply(
            lambda row: function_3(row, input_dict), axis=1,
        )

# Keep only rows of interest
df = df.query("prediction>0")

我想在一次调用中连接：

2x 管道
申请定义一个新变量
查询命令

为简单起见，function_1 和 function_2 是通用函数，只有 return DataFrame 和 function_3 接受来自 DataFrame 的行和预定义的行作为输入字典.

我试过：

df1 = (
    df
    .pipe(function_1)
    .pipe(function_2)
    .assign(
        prediction = lambda row: function_3(row, input_dict), axis=1
        )
    .query("prediction>0")
)

但由于“分配”方法，它会引发：

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Answer 1

通过直接应用于元素并设置 axis=1 解决：

df1 = (
    df
    .pipe(function_1)
    .pipe(function_2)
    .assign(
        prediction = lambda row: row.apply(function_3, input_dict=input_dict, axis=1)
        )
    .query("prediction>0")
)

连接 Pandas 管道并应用于行

Concatenate Pandas pipe and apply to rows

python

pipe

assign

pandas