反向地理编码:如何使用 BigQuery SQL 确定距离 (lat,lon) 最近的城市?
Reverse- geocoding: How to determine the city closest to a (lat,lon) with BigQuery SQL?
我收集了大量的点 - 我想确定离每个点最近的城市。我如何使用 BigQuery 执行此操作?
这是迄今为止我们计算出的性能最好的查询:
WITH a AS (
# a table with points around the world
SELECT * FROM UNNEST([ST_GEOGPOINT(-70, -33), ST_GEOGPOINT(-122,37), ST_GEOGPOINT(151,-33)]) my_point
), b AS (
# any table with cities world locations
SELECT *, ST_GEOGPOINT(lon,lat) latlon_geo
FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux`
)
SELECT my_point, city_name, subdivision_1_name, country_name, continent_name
FROM (
SELECT loc.*, my_point
FROM (
SELECT ST_ASTEXT(my_point) my_point, ANY_VALUE(my_point) geop
, ARRAY_AGG( # get the closest city
STRUCT(city_name, subdivision_1_name, country_name, continent_name)
ORDER BY ST_DISTANCE(my_point, b.latlon_geo) LIMIT 1
)[SAFE_OFFSET(0)] loc
FROM a, b
WHERE ST_DWITHIN(my_point, b.latlon_geo, 100000) # filter to only close cities
GROUP BY my_point
)
)
GROUP BY 1,2,3,4,5
I have a huge collection of points ...
Felipe 的解决方案在很多方面都很完美,但我发现,如果您真的没有几个点可以搜索最近的城市,并且您不能将自己限制在 60 英里以下的距离,解决方案会更好
#standardSQL
WITH a AS (
# a table with points around the world
SELECT ST_GEOGPOINT(lon,lat) my_point
FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux`
), b AS (
# any table with cities world locations
SELECT *, ST_GEOGPOINT(lon,lat) latlon_geo, ST_ASTEXT(ST_GEOGPOINT(lon,lat)) hsh
FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux`
)
SELECT AS VALUE
ARRAY_AGG(
STRUCT(my_point, city_name, subdivision_1_name, country_name, continent_name)
LIMIT 1
)[OFFSET(0)]
FROM (
SELECT my_point, ST_ASTEXT(closest) hsh
FROM a, (SELECT ST_UNION_AGG(latlon_geo) arr FROM b),
UNNEST([ST_CLOSESTPOINT(arr, my_point)]) closest
)
JOIN b
USING(hsh)
GROUP BY ST_ASTEXT(my_point)
注:
- 我正在使用 ST_CLOSESTPOINT 函数
- 为了模仿
not just few points ...
的情况,我使用了与 b
中相同的 table,所以有 100K 个点可以搜索最近的城市,而且没有距离或距离的限制查找城市可以是(对于这种情况 - 原始答案中的查询将以著名的 Query exceeded resource limits
结束 - 而如果不是最好的性能,它会显示更好,因为它在该答案中真实说明)
我收集了大量的点 - 我想确定离每个点最近的城市。我如何使用 BigQuery 执行此操作?
这是迄今为止我们计算出的性能最好的查询:
WITH a AS (
# a table with points around the world
SELECT * FROM UNNEST([ST_GEOGPOINT(-70, -33), ST_GEOGPOINT(-122,37), ST_GEOGPOINT(151,-33)]) my_point
), b AS (
# any table with cities world locations
SELECT *, ST_GEOGPOINT(lon,lat) latlon_geo
FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux`
)
SELECT my_point, city_name, subdivision_1_name, country_name, continent_name
FROM (
SELECT loc.*, my_point
FROM (
SELECT ST_ASTEXT(my_point) my_point, ANY_VALUE(my_point) geop
, ARRAY_AGG( # get the closest city
STRUCT(city_name, subdivision_1_name, country_name, continent_name)
ORDER BY ST_DISTANCE(my_point, b.latlon_geo) LIMIT 1
)[SAFE_OFFSET(0)] loc
FROM a, b
WHERE ST_DWITHIN(my_point, b.latlon_geo, 100000) # filter to only close cities
GROUP BY my_point
)
)
GROUP BY 1,2,3,4,5
I have a huge collection of points ...
Felipe 的解决方案在很多方面都很完美,但我发现,如果您真的没有几个点可以搜索最近的城市,并且您不能将自己限制在 60 英里以下的距离,解决方案会更好
#standardSQL
WITH a AS (
# a table with points around the world
SELECT ST_GEOGPOINT(lon,lat) my_point
FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux`
), b AS (
# any table with cities world locations
SELECT *, ST_GEOGPOINT(lon,lat) latlon_geo, ST_ASTEXT(ST_GEOGPOINT(lon,lat)) hsh
FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux`
)
SELECT AS VALUE
ARRAY_AGG(
STRUCT(my_point, city_name, subdivision_1_name, country_name, continent_name)
LIMIT 1
)[OFFSET(0)]
FROM (
SELECT my_point, ST_ASTEXT(closest) hsh
FROM a, (SELECT ST_UNION_AGG(latlon_geo) arr FROM b),
UNNEST([ST_CLOSESTPOINT(arr, my_point)]) closest
)
JOIN b
USING(hsh)
GROUP BY ST_ASTEXT(my_point)
注:
- 我正在使用 ST_CLOSESTPOINT 函数
- 为了模仿
not just few points ...
的情况,我使用了与b
中相同的 table,所以有 100K 个点可以搜索最近的城市,而且没有距离或距离的限制查找城市可以是(对于这种情况 - 原始答案中的查询将以著名的Query exceeded resource limits
结束 - 而如果不是最好的性能,它会显示更好,因为它在该答案中真实说明)