反向地理编码:如何使用 BigQuery SQL 确定距离 (lat,lon) 最近的城市?

Reverse- geocoding: How to determine the city closest to a (lat,lon) with BigQuery SQL?

我收集了大量的点 - 我想确定离每个点最近的城市。我如何使用 BigQuery 执行此操作?

这是迄今为止我们计算出的性能最好的查询:

WITH a AS (
  # a table with points around the world
  SELECT * FROM UNNEST([ST_GEOGPOINT(-70, -33), ST_GEOGPOINT(-122,37), ST_GEOGPOINT(151,-33)]) my_point
), b AS (
  # any table with cities world locations
  SELECT *, ST_GEOGPOINT(lon,lat) latlon_geo
  FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux` 
)

SELECT my_point, city_name, subdivision_1_name, country_name, continent_name
FROM (
  SELECT loc.*, my_point
  FROM (
    SELECT ST_ASTEXT(my_point) my_point, ANY_VALUE(my_point) geop
      , ARRAY_AGG( # get the closest city
           STRUCT(city_name, subdivision_1_name, country_name, continent_name) 
           ORDER BY ST_DISTANCE(my_point, b.latlon_geo) LIMIT 1
        )[SAFE_OFFSET(0)] loc
    FROM a, b 
    WHERE ST_DWITHIN(my_point, b.latlon_geo, 100000)  # filter to only close cities
    GROUP BY my_point
  )
)
GROUP BY 1,2,3,4,5

I have a huge collection of points ...

Felipe 的解决方案在很多方面都很完美,但我发现,如果您真的没有几个点可以搜索最近的城市,并且您不能将自己限制在 60 英里以下的距离,解决方案会更好

#standardSQL
WITH a AS (
  # a table with points around the world
  SELECT ST_GEOGPOINT(lon,lat) my_point
  FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux`  
), b AS (
  # any table with cities world locations
  SELECT *, ST_GEOGPOINT(lon,lat) latlon_geo, ST_ASTEXT(ST_GEOGPOINT(lon,lat)) hsh 
  FROM `fh-bigquery.geocode.201806_geolite2_latlon_redux` 
)
SELECT AS VALUE 
  ARRAY_AGG(
    STRUCT(my_point, city_name, subdivision_1_name, country_name, continent_name) 
    LIMIT 1
  )[OFFSET(0)]
FROM (
  SELECT my_point, ST_ASTEXT(closest) hsh 
  FROM a, (SELECT ST_UNION_AGG(latlon_geo) arr FROM b),
  UNNEST([ST_CLOSESTPOINT(arr, my_point)]) closest
)
JOIN b 
USING(hsh)
GROUP BY ST_ASTEXT(my_point)

注:

  • 我正在使用 ST_CLOSESTPOINT 函数
  • 为了模仿 not just few points ... 的情况,我使用了与 b 中相同的 table,所以有 100K 个点可以搜索最近的城市,而且没有距离或距离的限制查找城市可以是(对于这种情况 - 原始答案中的查询将以著名的 Query exceeded resource limits 结束 - 而如果不是最好的性能,它会显示更好,因为它在该答案中真实说明)