如何使用 BigQuery 获取任何城市的历史天气?

How to get the historical weather for any city with BigQuery?

BigQuery 将 NOAA 的 gsod 数据加载为 public 数据集 - 从 1929 年开始:https://www.reddit.com/r/bigquery/comments/2ts9wo/noaa_gsod_weather_data_loaded_into_bigquery/

如何检索任何城市的历史数据?

2019 年更新:为方便起见

SELECT * 
FROM `fh-bigquery.weather_gsod.all`
WHERE name='SAN FRANCISCO INTERNATIONAL A'
ORDER BY date DESC

每天更新 - 如果不更新,请在此处报告

例如,要获取旧金山车站自 1980 年以来最热的日子:

SELECT name, state, ARRAY_AGG(STRUCT(date,temp) ORDER BY temp DESC LIMIT 5) top_hot, MAX(date) active_until
FROM `fh-bigquery.weather_gsod.all` 
WHERE name LIKE 'SAN FRANC%'
AND date > '1980-01-01'
GROUP BY 1,2
ORDER BY active_until DESC

请注意,由于集群 table,此查询仅处理了 28MB。

类似,但我将使用位置和按位置聚类的 table 而不是站名:

WITH city AS (SELECT ST_GEOGPOINT(-122.465, 37.807))

SELECT name, state, ARRAY_AGG(STRUCT(date,temp) ORDER BY temp DESC LIMIT 5) top_hot, MAX(date) station_until
FROM `fh-bigquery.weather_gsod.all_geoclustered`  
WHERE EXTRACT(YEAR FROM date) > 1980
AND ST_DISTANCE(point_gis, (SELECT * FROM city)) < 40000
GROUP BY name, state
HAVING EXTRACT(YEAR FROM station_until)>2018
ORDER BY ST_DISTANCE(ANY_VALUE(point_gis), (SELECT * FROM city)) 
LIMIT 5


2017 年更新:标准 SQL 和 up-to-date tables:

SELECT TIMESTAMP(CONCAT(year,'-',mo,'-',da)) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp
FROM `bigquery-public-data.noaa_gsod.gsod2016`
WHERE stn='722540' AND wban='13904'
GROUP BY 1
ORDER BY day

附加示例,以显示这十年来芝加哥最冷的日子:

#standardSQL
SELECT year, FORMAT('%s%s',mo,da) day ,min
FROM `fh-bigquery.weather_gsod.stations` a
JOIN `bigquery-public-data.noaa_gsod.gsod201*` b
ON a.usaf=b.stn AND a.wban=b.wban
WHERE name='CHICAGO/O HARE ARPT'
AND min!=9999.9
AND mo<'03'
ORDER BY 1,2

要检索任何城市的历史天气,首先我们需要找到该城市的气象站报告。 table [fh-bigquery:weather_gsod.stations] 包含已知电台的名称、它们的州(如果在美国)、国家和其他详细信息。

因此,要查找德克萨斯州奥斯汀的所有车站,我们将使用如下查询:

SELECT state, name, lat, lon
FROM [fh-bigquery:weather_gsod.stations] 
WHERE country='US' AND state='TX' AND name CONTAINS 'AUST'
LIMIT 10

这种方法有2个问题需要解决:

  • 并非每个已知的电台都存在于 table - 我需要获取此文件的更新版本。所以如果你在这里没有找到你要找的车站,不要放弃。
  • 并非此文件中找到的每个台站每年都在运行 - 因此我们需要找到在我们要查找的年份中有数据的台站。

要解决第二个问题,我们需要将站点 table 与我们要查找的实际数据连接起来。以下查询查找奥斯汀周围的站点,c 列查看 2015 年有多少天有实际数据:

SELECT state, name, FIRST(a.wban) wban, FIRST(a.stn) stn, COUNT(*) c, INTEGER(SUM(IF(prcp=99.99,0,prcp))) rain, FIRST(lat) lat, FIRST(lon) long
FROM [fh-bigquery:weather_gsod.gsod2015] a
JOIN [fh-bigquery:weather_gsod.stations] b 
ON a.wban=b.wban
AND a.stn=b.usaf
WHERE country='US' AND state='TX' AND name CONTAINS 'AUST'
GROUP BY 1,2
LIMIT 10

太好了!我们在 2015 年期间找到了 4 个有奥斯汀数据的站点。

请注意,我们必须以特殊方式处理 "rain":当一个站没有监测雨水时,它会将其标记为 99.99,而不是 null。我们的查询过滤掉了这些值。

现在我们知道这些站的 stn 和 wban 编号,我们可以选择其中任何一个并可视化结果:

SELECT TIMESTAMP('2015'+mo+da) day, AVG(min) min, AVG(max) max, AVG(IF(prcp=99.99,0,prcp)) prcp
FROM [fh-bigquery:weather_gsod.gsod2015]
WHERE stn='722540' AND wban='13904'
GROUP BY 1
ORDER BY day

感谢您提取数据并使其成为 public table。这是一个 BigQuery,returns 德克萨斯州每个站点 2014 年的总降雨量:

SELECT FIRST(name) AS station_name, stn, SUM(prcp) AS annual_precip
FROM [fh-bigquery:weather_gsod.gsod2014] gsod
JOIN [fh-bigquery:weather_gsod.stations] stations
ON gsod.wban=stations.wban AND gsod.stn=stations.usaf
WHERE state='TX' AND prcp != 99.99
GROUP BY stn

其中returns:

拉入每个位置的下雨天数,并根据此对结果进行排序:

SELECT FIRST(name) AS station_name, stn, SUM(prcp) AS annual_precip,     COUNT(prcp) AS rainy_days
FROM [fh-bigquery:weather_gsod.gsod2014] gsod
JOIN [fh-bigquery:weather_gsod.stations] stations
ON gsod.wban=stations.wban AND gsod.stn=stations.usaf
WHERE state='TX' AND prcp != 99.99 AND prcp > 0
GROUP BY stn
ORDER BY rainy_days DESC

得出

现在有 official set of the NOAA data on BigQuery in addition to Felipe's "official" public dataset. There's a blog post describing it

获取 2016 年 8 月 15 日最低温度的示例:

SELECT
  name, 
  value/10 AS min_temperature,
  latitude,
  longitude
FROM
  [bigquery-public-data:ghcn_d.ghcnd_stations] AS stn
JOIN
  [bigquery-public-data:ghcn_d.ghcnd_2016] AS wx
ON
  wx.id = stn.id
WHERE
  wx.element = 'TMIN'
  AND wx.qflag IS NULL
  AND STRING(wx.date) = '2016-08-15'

哪个returns:

使用站名是不可靠的。此外,很难使用新的 bigquery 功能进行地理空间查询,因为城市边界没有清晰的形状(如圆形或矩形)。

因此,我发现您的问题的最佳解决方案是使用反向地理编码,要求 Google 地图 API 为每个站点生成地址、州、城市和县,使用它的 lat/lon坐标。

这是美国的 CSV (StationNumber,Lat,Lon,Address,State,City,County,Zip) 生成结果(您会注意到那里存在 98% 的电台): https://gist.github.com/orcaman/a3e23c47489705dff93aace2e35f57d3

如果您想在美国以外的站点 (golang) 重新运行,请使用以下代码: https://gist.github.com/orcaman/8de55f14f1c70ef5b0c124cf2fb7d9d1