如何在 python dask 中使用 group by describe with unstack 操作?
How to use group by describe with unstack operation in python dask?
我正在尝试使用dask中的describe() and unstack()
函数来获取数据的汇总统计信息。
但是,我收到如下所示的错误
import dask.dataframe as dd
df = dd.read_csv('Measurement_table.csv',assume_missing=True)
df.describe().compute() #this works but when I try to use `unstack`, i get an error
实际上我正在尝试使下面的 python pandas 代码在 dask
的帮助下运行得更快
df.groupby(['person_id','measurement_concept_id','visit_occurrence_id'])['value_as_number']
.describe()
.unstack()
.swaplevel(0,1,axis=1)
.reindex(df['readings'].unique(), axis=1, level=0)
我尝试将 compute()
添加到每个输出级,如下所示
df1 = df.groupby(['person_id','measurement_concept_id','visit_occurrence_id'])['value_as_number'].describe().unstack().swaplevel(0,1,axis=1).reindex(df['readings'].unique(), axis=1, level=0).compute()
我收到以下错误,但是 same works well in pandas
谁能帮我解决这个问题?
在 dask unstack
未实现,但 describe
可以与 apply
一起使用:
df = (sd.groupby(['subject_id','readings'])['val']
.apply(lambda x: x.describe())
.reset_index()
.rename(columns={'level_2':'func'})
.compute()
)
print (df)
subject_id readings func val
0 1 READ_1 count 2.000000
1 1 READ_1 mean 6.000000
2 1 READ_1 std 1.414214
3 1 READ_1 min 5.000000
4 1 READ_1 25% 5.500000
.. ... ... ... ...
51 4 READ_09 min 45.000000
52 4 READ_09 25% 45.000000
53 4 READ_09 50% 45.000000
54 4 READ_09 75% 45.000000
55 4 READ_09 max 45.000000
[112 rows x 4 columns]
我正在尝试使用dask中的describe() and unstack()
函数来获取数据的汇总统计信息。
但是,我收到如下所示的错误
import dask.dataframe as dd
df = dd.read_csv('Measurement_table.csv',assume_missing=True)
df.describe().compute() #this works but when I try to use `unstack`, i get an error
实际上我正在尝试使下面的 python pandas 代码在 dask
的帮助下运行得更快df.groupby(['person_id','measurement_concept_id','visit_occurrence_id'])['value_as_number']
.describe()
.unstack()
.swaplevel(0,1,axis=1)
.reindex(df['readings'].unique(), axis=1, level=0)
我尝试将 compute()
添加到每个输出级,如下所示
df1 = df.groupby(['person_id','measurement_concept_id','visit_occurrence_id'])['value_as_number'].describe().unstack().swaplevel(0,1,axis=1).reindex(df['readings'].unique(), axis=1, level=0).compute()
我收到以下错误,但是 same works well in pandas
谁能帮我解决这个问题?
在 dask unstack
未实现,但 describe
可以与 apply
一起使用:
df = (sd.groupby(['subject_id','readings'])['val']
.apply(lambda x: x.describe())
.reset_index()
.rename(columns={'level_2':'func'})
.compute()
)
print (df)
subject_id readings func val
0 1 READ_1 count 2.000000
1 1 READ_1 mean 6.000000
2 1 READ_1 std 1.414214
3 1 READ_1 min 5.000000
4 1 READ_1 25% 5.500000
.. ... ... ... ...
51 4 READ_09 min 45.000000
52 4 READ_09 25% 45.000000
53 4 READ_09 50% 45.000000
54 4 READ_09 75% 45.000000
55 4 READ_09 max 45.000000
[112 rows x 4 columns]