如何在 GPU DataFrame-cuDF 中应用 if 条件来过滤 DataFrame?

How to apply if condition in GPU DataFrame- cuDF to filter the DataFrame?

我想根据列值过滤 cuDF 数据框,然后根据指定的条件创建新列。基本上,我如何在 cuDF 中应用以下内容?

df.loc[df.column_name condition, 'new column name'] = 'value if condition is met'

在 cuDF

中给定 Pandas
# value to be replaced in series 
value = 'value if condition is met'
# condition to qualify for replacement
mask = df.column_name condition

# https://docs.rapids.ai/api/cudf/stable/
df['new column name'] = df.masked_assign(value, mask)

应用实例

"""explanation: 
  >> if there is no pool, pool_sqft should be 0
"""

# value to be replaced in series 
value = 0
# condition to qualify for replacement
mask = df_train['pool_count']==0

# https://docs.rapids.ai/api/cudf/stable/
df['pool_sqft'] = df.masked_assign(value, mask)

虽然 masked_assign 适用于特定条件,但 applymapsyntactically better and functionally similar to the Pandas API

此外,@ashwin-srinath 提到 __setitem()__ 即将发布 0.9 版本,因此您将能够做到 df[condition] = valuemasked_assign 可能会被 __setitem()__ 取代,因为 masked_assign 不是 Pandas API 函数。

您也可以使用 .query()

示例:

expr = "(a == 2) or (b == 3)"
filtered_df = df.query(expr)

其中 ab 是数据框中列的名称。