列的 Dask Dataframe 总和总是返回标量
Dask Dataframe sum of column always returning scalar
我创建了一个 Dask Dataframe(称为 "df")并且索引为“11”的列具有整数值:
In [62]: df[11]
Out[62]:
Dask Series Structure:
npartitions=42
int64
...
...
...
...
Name: 11, dtype: int64
Dask Name: getitem, 168 tasks
我试图将这些总结为:
df[11].sum()
我得到 dd.Scalar<series-..., dtype=int64>
返回。尽管研究了这可能意味着什么,但我仍然不明白为什么我没有得到返回的数值。我如何将其转化为数值?
我认为你需要 compute
来告诉 Dask
处理之前发生的所有事情:
compute(**kwargs)
Compute this dask collection
This turns a lazy Dask collection into its in-memory equivalent. For example a Dask.array turns into a numpy.array() and a Dask.dataframe turns into a Pandas dataframe. The entire dataset must fit into memory before calling this operation.
df[11].sum().compute()
我创建了一个 Dask Dataframe(称为 "df")并且索引为“11”的列具有整数值:
In [62]: df[11]
Out[62]:
Dask Series Structure:
npartitions=42
int64
...
...
...
...
Name: 11, dtype: int64
Dask Name: getitem, 168 tasks
我试图将这些总结为:
df[11].sum()
我得到 dd.Scalar<series-..., dtype=int64>
返回。尽管研究了这可能意味着什么,但我仍然不明白为什么我没有得到返回的数值。我如何将其转化为数值?
我认为你需要 compute
来告诉 Dask
处理之前发生的所有事情:
compute(**kwargs)
Compute this dask collectionThis turns a lazy Dask collection into its in-memory equivalent. For example a Dask.array turns into a numpy.array() and a Dask.dataframe turns into a Pandas dataframe. The entire dataset must fit into memory before calling this operation.
df[11].sum().compute()