Pandas 如何在没有值的分类范围的情况下进行 bin 和 groupby

Question

我有大量的纬度和经度值，我想将它们合并在一起以便在热图上显示它们（ipyleaflet 似乎只允许热图中有 2000 个左右的点，这会在使用大数据时也会更有效率。

我实际上使用的是 vaex，但是 pandas 的答案也可以。

pandas pd.cut 函数似乎在分箱方面很有帮助，但它会生成一个分类列 (category dtype)，它看起来像一个列表bin 中的所有值。是否有某种方法可以将其更改为仅标识每个垃圾箱的增量数字（感谢 jezreal 的那部分答案）？我只需要一个 bin 编号，然后 groupby 纬度和经度列上的 bin 编号和平均值 (mean)。我还需要计算热图条目的强度。

例如：

dft = pd.DataFrame({
    'latitude': [1.5, 0.5, 1.2, 0.9, 3],
    'longitude': [3, 0.2, 2, 0.2, 1.1]
    })

dft['bin'] = pd.cut(dft['latitude'], bins=3, labels=False).astype(str) + "_" + pd.cut(dft['longitude'], bins=3, labels=False).astype(str)

dft.groupby('bin').agg(['mean', 'count']).unstack()

Almost gives me the answer, but I think I want this output instead:

bin latitude_mean longitude_mean count
0_0 0.7           0.2            2
0_1 1.2           2.0            1
1_2 1.5           3.0            1
2_0 3.0           1.1            1

如果计数可以在 1 到 1000 之间归一化，将会很有帮助。

我如何使用 pandas pd.cut 或其他东西来 groupby 行中的 bin，列中的经纬度和（热图强度）计数的平均值？

Answer 1

The pandas pd.cut function seems to be helpful in terms of binning, however it produces a categorical column (category dtype) which looks like a list of all the values in the bin. Is there some way of changing this to just be an incremental number identifying each bin

是，在cut中使用label=False参数：

labels array or False, default None
Specifies the labels for the returned bins. Must be the same length as the resulting bins. If False, returns only integer indicators of the bins.

上次使用 GroupBy.agg 进行聚合并最后规范化 count 列：

df = dft.groupby('bin').agg(latitude_mean=('latitude','mean'),
                            longitude_mean=('longitude','mean'),
                            count=('latitude','count'))

#
a, b = 1, 1000
x, y = df['count'].min(),df['count'].max()
df['count'] = (df['count'] - x) / (y - x) * (b - a) + a

print (df)

     latitude_mean  longitude_mean   count
bin                                       
0_0            0.7             0.2  1000.0
0_1            1.2             2.0     1.0
1_2            1.5             3.0     1.0
2_0            3.0             1.1     1.0

Pandas 如何在没有值的分类范围的情况下进行 bin 和 groupby

Pandas how to bin and groupby without categorical range of values

python-3.x

pandas

pandas-groupby

vaex