Bigquery:在 ST_GEOGPOINT 的数组中使用 ST_CLUSTERDBSCAN
Bigquery: Using ST_CLUSTERDBSCAN in an array of ST_GEOGPOINT
我想使用 ST_CLUSTERDBSCAN 来聚类地理点。
Bigquery 页面中的示例是这个:
WITH Geos as
(SELECT 1 as row_id, st_geogfromtext('point empty') as geo UNION ALL
SELECT 2, st_geogfromtext('multipoint(1 1, 2 2, 4 4, 5 2)') UNION ALL
SELECT 3, st_geogfromtext('point(14 15)') UNION ALL
SELECT 4, st_geogfromtext('linestring(40 1, 42 34, 44 39)') UNION ALL
SELECT 5, st_geogfromtext('polygon((40 2, 40 1, 41 2, 40 2))'))
SELECT row_id, geo, ST_CLUSTERDBSCAN(geo, 1e5, 1) OVER () AS cluster_num FROM
Geos ORDER BY row_id
+--------+-----------------------------------+-------------+
| row_id | geo | cluster_num |
+--------+-----------------------------------+-------------+
| 1 | GEOMETRYCOLLECTION EMPTY | NULL |
| 2 | MULTIPOINT(1 1, 2 2, 5 2, 4 4) | 0 |
| 3 | POINT(14 15) | 1 |
| 4 | LINESTRING(40 1, 42 34, 44 39) | 2 |
| 5 | POLYGON((40 2, 40 1, 41 2, 40 2)) | 2 |
+--------+-----------------------------------+-------------+
在我的代码中,我有一个聚合在一起的点数组。
但是,我在结果中看到的 MULTIPOINT 似乎没有任何效果。
我的代码:
ST_CLUSTERDBSCAN(ST_UNION_AGG(buyer_geo_point), 1e4, 2) OVER () AS cluster_num ,
ST_UNION_AGG(buyer_geo_point)
结果为空或具有完全错误的值:
null
POINT(-41.5320687976469 -20.3600487114797)
null
MULTIPOINT(-39.0833794 -5.9597183, -39.00682744 -5.73228798)
null
POINT(-40.224447061747 -17.3677128083793)
null
POINT(-40.10711168 -18.08920528)
32
null
POINT(-41.10854564 -21.47675214)
null
POINT(-51.11207578 -20.64520046)
117
MULTIPOINT(-38.08106136 -11.94490164, -38.06814822 -11.94196154)
117
MULTIPOINT(-38.07860266 -11.94484066, -38.0786308 -11.9448231, -38.0787098 -11.9447567, -38.0786912 -11.9447861, -38.0676091 -11.9453678)
null
MULTIPOINT(-39.98731268 -14.8426174, -39.98782804 -14.84623434)
更新:
我想出了一个解决方案来标记每个集群上的点。
WITH merchant_cluster as (SELECT
pl_gl.merchant_id,
ST_CLUSTERDBSCAN(buyer_geo_point, 1e3, 1) OVER (Partition by merchant_id) as clusters ,
buyer_geo_point
FROM `geo-info-table` as geo
LEFT JOIN `merchants-table` as m on geo.merchant_id = m.user_id
LEFT JOIN `adresses-table` as add on m.user_id = add.user_id
)
SELECT merchant_id, STRUCT(ARRAY_AGG(IFNULL(clusters,-1)) as cluster_id, ARRAY_AGG(buyer_geo_point) as point) FROM merchant_cluster
GROUP BY merchant_id
试试下面
select cluster_num, ST_UNION_AGG(buyer_geo_point) geo_cluster
from (
select buyer_geo_point,
ST_CLUSTERDBSCAN(buyer_geo_point, 1e4, 2) OVER () AS cluster_num
from `project.dataset.table`
)
group by cluster_num
我尝试使用您问题中公开的点来模拟您的数据,并使用上面的代码得到了以下结果(注意 - 我使用 ST_CLUSTERDBSCAN(buyer_geo_point, 200000, 1) 因为集非常小点数)
下面是这个结果的可视化 - 每个集群都分配了单独的颜色
我想使用 ST_CLUSTERDBSCAN 来聚类地理点。 Bigquery 页面中的示例是这个:
WITH Geos as
(SELECT 1 as row_id, st_geogfromtext('point empty') as geo UNION ALL
SELECT 2, st_geogfromtext('multipoint(1 1, 2 2, 4 4, 5 2)') UNION ALL
SELECT 3, st_geogfromtext('point(14 15)') UNION ALL
SELECT 4, st_geogfromtext('linestring(40 1, 42 34, 44 39)') UNION ALL
SELECT 5, st_geogfromtext('polygon((40 2, 40 1, 41 2, 40 2))'))
SELECT row_id, geo, ST_CLUSTERDBSCAN(geo, 1e5, 1) OVER () AS cluster_num FROM
Geos ORDER BY row_id
+--------+-----------------------------------+-------------+
| row_id | geo | cluster_num |
+--------+-----------------------------------+-------------+
| 1 | GEOMETRYCOLLECTION EMPTY | NULL |
| 2 | MULTIPOINT(1 1, 2 2, 5 2, 4 4) | 0 |
| 3 | POINT(14 15) | 1 |
| 4 | LINESTRING(40 1, 42 34, 44 39) | 2 |
| 5 | POLYGON((40 2, 40 1, 41 2, 40 2)) | 2 |
+--------+-----------------------------------+-------------+
在我的代码中,我有一个聚合在一起的点数组。 但是,我在结果中看到的 MULTIPOINT 似乎没有任何效果。
我的代码:
ST_CLUSTERDBSCAN(ST_UNION_AGG(buyer_geo_point), 1e4, 2) OVER () AS cluster_num ,
ST_UNION_AGG(buyer_geo_point)
结果为空或具有完全错误的值:
null
POINT(-41.5320687976469 -20.3600487114797)
null
MULTIPOINT(-39.0833794 -5.9597183, -39.00682744 -5.73228798)
null
POINT(-40.224447061747 -17.3677128083793)
null
POINT(-40.10711168 -18.08920528)
32
null
POINT(-41.10854564 -21.47675214)
null
POINT(-51.11207578 -20.64520046)
117
MULTIPOINT(-38.08106136 -11.94490164, -38.06814822 -11.94196154)
117
MULTIPOINT(-38.07860266 -11.94484066, -38.0786308 -11.9448231, -38.0787098 -11.9447567, -38.0786912 -11.9447861, -38.0676091 -11.9453678)
null
MULTIPOINT(-39.98731268 -14.8426174, -39.98782804 -14.84623434)
更新: 我想出了一个解决方案来标记每个集群上的点。
WITH merchant_cluster as (SELECT
pl_gl.merchant_id,
ST_CLUSTERDBSCAN(buyer_geo_point, 1e3, 1) OVER (Partition by merchant_id) as clusters ,
buyer_geo_point
FROM `geo-info-table` as geo
LEFT JOIN `merchants-table` as m on geo.merchant_id = m.user_id
LEFT JOIN `adresses-table` as add on m.user_id = add.user_id
)
SELECT merchant_id, STRUCT(ARRAY_AGG(IFNULL(clusters,-1)) as cluster_id, ARRAY_AGG(buyer_geo_point) as point) FROM merchant_cluster
GROUP BY merchant_id
试试下面
select cluster_num, ST_UNION_AGG(buyer_geo_point) geo_cluster
from (
select buyer_geo_point,
ST_CLUSTERDBSCAN(buyer_geo_point, 1e4, 2) OVER () AS cluster_num
from `project.dataset.table`
)
group by cluster_num
我尝试使用您问题中公开的点来模拟您的数据,并使用上面的代码得到了以下结果(注意 - 我使用 ST_CLUSTERDBSCAN(buyer_geo_point, 200000, 1) 因为集非常小点数)
下面是这个结果的可视化 - 每个集群都分配了单独的颜色