如何在 GPU DataFrame-cuDF 中应用 if 条件来过滤 DataFrame?
How to apply if condition in GPU DataFrame- cuDF to filter the DataFrame?
我想根据列值过滤 cuDF 数据框,然后根据指定的条件创建新列。基本上,我如何在 cuDF 中应用以下内容?
df.loc[df.column_name condition, 'new column name'] = 'value if condition is met'
在 cuDF
中给定 Pandas
# value to be replaced in series
value = 'value if condition is met'
# condition to qualify for replacement
mask = df.column_name condition
# https://docs.rapids.ai/api/cudf/stable/
df['new column name'] = df.masked_assign(value, mask)
应用实例
"""explanation:
>> if there is no pool, pool_sqft should be 0
"""
# value to be replaced in series
value = 0
# condition to qualify for replacement
mask = df_train['pool_count']==0
# https://docs.rapids.ai/api/cudf/stable/
df['pool_sqft'] = df.masked_assign(value, mask)
虽然 masked_assign
适用于特定条件,但 applymap
是 syntactically better and functionally similar to the Pandas API。
此外,@ashwin-srinath 提到 __setitem()__
即将发布 0.9 版本,因此您将能够做到 df[condition] = value
。 masked_assign
可能会被 __setitem()__
取代,因为 masked_assign
不是 Pandas API 函数。
您也可以使用 .query()
示例:
expr = "(a == 2) or (b == 3)"
filtered_df = df.query(expr)
其中 a
和 b
是数据框中列的名称。
我想根据列值过滤 cuDF 数据框,然后根据指定的条件创建新列。基本上,我如何在 cuDF 中应用以下内容?
df.loc[df.column_name condition, 'new column name'] = 'value if condition is met'
在 cuDF
中给定 Pandas# value to be replaced in series
value = 'value if condition is met'
# condition to qualify for replacement
mask = df.column_name condition
# https://docs.rapids.ai/api/cudf/stable/
df['new column name'] = df.masked_assign(value, mask)
应用实例
"""explanation:
>> if there is no pool, pool_sqft should be 0
"""
# value to be replaced in series
value = 0
# condition to qualify for replacement
mask = df_train['pool_count']==0
# https://docs.rapids.ai/api/cudf/stable/
df['pool_sqft'] = df.masked_assign(value, mask)
虽然 masked_assign
适用于特定条件,但 applymap
是 syntactically better and functionally similar to the Pandas API。
此外,@ashwin-srinath 提到 __setitem()__
即将发布 0.9 版本,因此您将能够做到 df[condition] = value
。 masked_assign
可能会被 __setitem()__
取代,因为 masked_assign
不是 Pandas API 函数。
您也可以使用 .query()
示例:
expr = "(a == 2) or (b == 3)"
filtered_df = df.query(expr)
其中 a
和 b
是数据框中列的名称。