如何在 python dask 中使用 group by describe with unstack 操作?

How to use group by describe with unstack operation in python dask?

我正在尝试使用dask中的describe() and unstack()函数来获取数据的汇总统计信息。

但是,我收到如下所示的错误

import dask.dataframe as dd
df = dd.read_csv('Measurement_table.csv',assume_missing=True)
df.describe().compute() #this works but when I try to use `unstack`, i get an error

实际上我正在尝试使下面的 python pandas 代码在 dask

的帮助下运行得更快
df.groupby(['person_id','measurement_concept_id','visit_occurrence_id'])['value_as_number']
    .describe()
    .unstack()
    .swaplevel(0,1,axis=1)
    .reindex(df['readings'].unique(), axis=1, level=0)

我尝试将 compute() 添加到每个输出级,如下所示

df1 = df.groupby(['person_id','measurement_concept_id','visit_occurrence_id'])['value_as_number'].describe().unstack().swaplevel(0,1,axis=1).reindex(df['readings'].unique(), axis=1, level=0).compute()

我收到以下错误,但是 same works well in pandas

谁能帮我解决这个问题?

在 dask unstack 未实现,但 describe 可以与 apply 一起使用:

df = (sd.groupby(['subject_id','readings'])['val']
        .apply(lambda x: x.describe())
        .reset_index()
        .rename(columns={'level_2':'func'})
        .compute()
        )
print (df)
    subject_id readings   func        val
0            1   READ_1  count   2.000000
1            1   READ_1   mean   6.000000
2            1   READ_1    std   1.414214
3            1   READ_1    min   5.000000
4            1   READ_1    25%   5.500000
..         ...      ...    ...        ...
51           4  READ_09    min  45.000000
52           4  READ_09    25%  45.000000
53           4  READ_09    50%  45.000000
54           4  READ_09    75%  45.000000
55           4  READ_09    max  45.000000

[112 rows x 4 columns]