将函数应用于 Polars-DataFrame 的所有列

Question

我知道如何将函数应用于 Pandas-DataFrame 中存在的所有列。但是，我还没有想出如何在使用 Polars-DataFrame.

时实现这一点

我查看了专门讨论这个主题的section from the Polars User Guide，但我没有找到答案。在这里，我附上了我不成功尝试的代码片段。

import numpy as np
import polars as pl
import seaborn as sns

# Loading toy dataset as Pandas DataFrame using Seaborn
df_pd = sns.load_dataset('iris')

# Converting Pandas DataFrame to Polars DataFrame
df_pl = pl.DataFrame(df_pd)

# Dropping the non-numeric column...
df_pd = df_pd.drop(columns='species')                     # ... using Pandas
df_pl = df_pl.drop('species')                             # ... using Polars

# Applying function to the whole DataFrame...
df_pd_new = df_pd.apply(np.log2)                          # ... using Pandas
# df_pl_new = df_pl.apply(np.log2)                        # ... using Polars?

# Applying lambda function to the whole DataFrame...
df_pd_new = df_pd.apply(lambda c: np.log2(c))             # ... using Pandas
# df_pl_new = df_pl.apply(lambda c: np.log2(c))           # ... using Polars?

在此先感谢您的帮助和抽出时间。

Answer 1

您可以使用表达式语法 select 所有具有 pl.col("*")/pl.all() 的列，然后 map 对列使用 numpy np.log2(..) 函数。

df.select([
    pl.all().map(np.log2)
])

Polars 表达式也支持 numpy 通用函数https://numpy.org/doc/stable/reference/ufuncs.html

这意味着您可以将 polars 表达式传递给 numpy ufunc:

df.select([
    np.log2(pl.all())
])

请注意 apply 和 map 之间的区别在于 apply 会在每个数值上调用，而 map 会在整个 Series。这里我们选择map，因为这样会更快。

将函数应用于 Polars-DataFrame 的所有列

Apply function to all columns of a Polars-DataFrame

python

apply

dataframe

pandas

python-polars