如何在 dask 中编写 unstack 和 reindex？

Question

我在 pandas 中编写了脚本，但由于效率问题，我需要切换到 dask，但我不确定如何在 dask 中实现 unstack 和 reindex？

这是我的 pandas 脚本的样子：

df_new = df.groupby(['Cars', 'Date'])['Durations'].mean().unstack(fill_value=0).reindex(columns=list_days,index=list_cars,fill_value=0).\
    round().reset_index().fillna(0).round()

Answer 1

通常，.groupby() 聚合的结果会很小并且适合内存。如 https://docs.dask.org/en/latest/dataframe-best-practices.html#reduce-and-then-use-pandas 所示，您可以使用 Dask 进行大型聚合，然后使用 pandas 进行小型内存 post-processing。

df_new = (
    df.groupby(['Cars', 'Date'])['Durations'].mean()
      .compute()  # turn the Dask DataFrame into a pandas dataframe
      .unstack(fill_value=0).reindex(columns=list_days,index=list_cars,fill_value=0).
      .round().reset_index().fillna(0).round()
)

如何在 dask 中编写 unstack 和 reindex？

How to write unstack and reindex in dask?

dataframe

pandas

reindex

dask