Pandas dataframe 如何对数字列的 bins 进行分组，然后计算其他二进制列

Question

我有一个数据

c1 c2  SED f
1  2   0.2 1
3  3   0.7 1
3  1   0.1 0
8  1   0.6 0
9  2   1   1
4  9   8.3 1

我想将 SED 分组到宽度为 0.5 的 bin 和 foreach bin，计算列 f 为 1 的行数和它为 0 的行数。

所以对于这个例子，我将得到：

SED_bin   cou_0   cou_1     
  0-0.5     1       1
  0.5-1     1       2
  8-8.5     0       1

最好的方法是什么？请注意，这只是 SED 值的一个示例，可能有更多低于此范围的值，因此我需要分箱是通用的

Answer 1

一种选择是使用 cut + crosstab:

out = (pd.crosstab(pd.cut(df['SED'], np.arange(int(df['SED'].min()), int(df['SED'].max())+1, 0.5)), df['f'])
       .add_prefix('count_').rename_axis(index='SED_bins').reset_index())

输出：

f    SED_bins  count_0  count_1
0  (0.0, 0.5]        1        1
1  (0.5, 1.0]        1        2
2  (8.0, 8.5]        0        1

Pandas dataframe 如何对数字列的 bins 进行分组，然后计算其他二进制列

Pandas dataframe how to groupby bins of numerical column and then count other binary column

group-by

dataframe

pandas