将 pandas 数据帧中的数据分箱成间隔

Question

我有一个数据框，其中有速度记录（100 万个条目）。我必须将它们分成特定的间隔，此处用 cw_bins 表示。我已经设法在下面的代码中做到了。但是通过这种方式访问每个 bin 我需要更改 cw_y 的索引。我可以写一个 for 循环来将它们保存在不同的列表中。我只是想知道 pandas 中是否有更好的方法来做到这一点。

cw_bins=[14.0,15.0,16.0,17.0,18.0,19.0,20.0]
groups=filter_power.groupby(pd.cut(filter_power.Speed,cw_bins))
cw_x=list(groups)
cw_y=(cw_x[4])
cw_z=(cw_y[1])

Answer 1

使用 numpy.digitize 怎么样？

cw_bins=[14.0,15.0,16.0,17.0,18.0,19.0,20.0]

# create a column with the index of the binned value
df['binned_values'] = np.digitize(df.Speed.values, bins=cw_bins)

# for example - get all records between 16.0 and 17.0
df[df['binned_values'] == 3]

将 pandas 数据帧中的数据分箱成间隔

Binning data in a pandas dataframe into intervals

python

binning

dataframe

pandas