Python 从点击按钮后提供文件的网站下载 NetCDF 文件

Question

如果您访问此网站： https://ruc.noaa.gov/raobs/Data_request.cgi?byr=2010&bmo=5&bdy=26&bhr=12&eyr=2010&emo=5&edy=27&ehr=15&shour=All+Times&ltype=All+Levels&wunits=Knots&access=WMO+Station+Identifier

在框中键入“72632”，并将“格式”更改为“NetCDF 格式（二进制）”，然后单击“继续数据访问”，一个 NetCDF 文件将下载到您的计算机。

如果我在单击此按钮后使用 Chrome 开发人员工具跟踪网络 activity，我可以看到导致下载此文件的“请求 URL” : https://ruc.noaa.gov/raobs/GetRaobs.cgi?shour=All+Times&ltype=All+Levels&wunits=Knots&bdate=2010052612&edate=2010052715&access=WMO+Station+Identifier&view=NO&StationIDs=72632&osort=Station+Series+Sort&oformat=NetCDF+format+%28Binary%29

如果您将 URL 复制并粘贴到网络浏览器中，则会下载该文件。

我想要做的是使用 Python 获取一个 URL 格式如上，并检索关联的 NetCDF 文件。

我过去曾幸运地做过类似

的事情

url = 'https://ruc.noaa.gov/raobs/GetRaobs.cgi?shour=All+Times&ltype=All+Levels&wunits=Knots&bdate=2010052612&edate=2010052715&access=WMO+Station+Identifier&view=NO&StationIDs=72632&osort=Station+Series+Sort&oformat=NetCDF+format+%28Binary%29'
da  = xr.open_dataset(url)

但这在这种情况下不起作用：

OSError: [Errno -75] NetCDF: Malformed or unexpected Constraint: b'https://ruc.noaa.gov/raobs/GetRaobs.cgi?shour=All+Times&ltype=All+Levels&wunits=Knots&bdate=2010052612&edate=2010052715&access=WMO+Station+Identifier&view=NO&StationIDs=72632&osort=Station+Series+Sort&oformat=NetCDF+format+%28Binary%29'

我也尝试过 wget URL，但那只会下载一个我认为没有用的“.cgi”文件。

感谢您的帮助！

Answer 1

您可以使用我的软件包 nctoolkit 下载文件，然后导出到 xarray。这会将文件保存到临时目录，但会在会话完成后将其删除。

import nctoolkit as nc
import xarray as xr
ds = nc.open_url("https://ruc.noaa.gov/raobs/GetRaobs.cgi?shour=All+Times&ltype=All+Levels&wunits=Knots&bdate=2010052612&edate=2010052715&access=WMO+Station+Identifier&view=NO&StationIDs=72632&osort=Station+Series+Sort&oformat=NetCDF+format+%28Binary%29")
ds_xr = ds.to_xarray()

Python 从点击按钮后提供文件的网站下载 NetCDF 文件

Python Download NetCDF file from a website which provides the file after clicking a button

python

netcdf

web-scraping