使用 xarray 读取授权的 opendap url
read in authorized opendap url using xarray
我想用 xarray 打开一个 opendap url。它需要授权,因为它在 UCAR RDA 持有:
https://rda.ucar.edu/datasets/ds084.1/#!description
一个文件的url如下'https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2020/20200101/gfs.0p25.2020010100.f000.grib2'
我不确定我是否可以将授权作为 backend_kwarg?
下面的代码会给出错误信息
import xarray as xr
url = "https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2020/20200101/gfs.0p25.2020010100.f000.grib2"
ds = xr.open_dataset(url)
Traceback (most recent call last):
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/file_manager.py", line 199, in _acquire_with_cache_info
file = self._cache[self._key]
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/lru_cache.py", line 53, in __getitem__
value = self._cache[key]
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2020/20200101/gfs.0p25.2020010100.f000.grib2',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py", line 572, in open_dataset
store = opener(filename_or_obj, **extra_kwargs, **backend_kwargs)
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py", line 364, in open
return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py", line 314, in __init__
self.format = self.ds.data_model
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py", line 373, in ds
return self._acquire()
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py", line 367, in _acquire
with self._manager.acquire_context(needs_lock) as root:
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/file_manager.py", line 187, in acquire_context
file, cached = self._acquire_with_cache_info(needs_lock)
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/file_manager.py", line 205, in _acquire_with_cache_info
file = self._opener(*self._args, **kwargs)
File "netCDF4/_netCDF4.pyx", line 2357, in netCDF4._netCDF4.Dataset.__init__
File "netCDF4/_netCDF4.pyx", line 1925, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -78] NetCDF: Authorization failure: b'https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2020/20200101/gfs.0p25.2020010100.f000.grib2'
Siphon 的 session_manager 可能暗示了 auth 的样子 https://unidata.github.io/siphon/latest/examples/Basic_Usage.html#sphx-glr-examples-basic-usage-py / https://github.com/Unidata/siphon/blob/master/siphon/http_util.py#L52
感谢 Ryan May 指点我 https://publicwiki.deltares.nl/display/OET/Accessing+netCDF+data+via+OPeNDAP+on+password+protected+servers
在我的主目录中创建点文件允许我阅读 url。可能不是最干净的,我想可能会导致 VM's/cluster 出现问题,但可以正常工作。仍然希望 backend_kwargs
方法。
在您的主目录中创建一个文件 .netrc,如下所示:
machine rda.ucar.edu
login USR
password PWD
在您的主目录中有一个 .dodsrc 文件,如下所示:
HTTP.COOKIEJAR=<HOME_DIR>/.cookies
HTTP.NETRC=<HOME_DIR>/.netrc
您现在可以传递需要身份验证的 urls:
import xarray as xr
url = "https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2020/20200101/gfs.0p25.2020010100.f000.grib2"
ds = xr.open_dataset(url)
如果您不是绝对需要使用 OPeNDAP,而只是想要某种与 xarray 接口的类似 OPeNDAP 的东西,您可以改用 THREDDS 的 CDMRemote 协议。在这种情况下,我们可以通过 requests
:
利用 Siphon 对基本 HTTP 身份验证的支持
from siphon.catalog import TDSCatalog
from siphon.http_util import session_manager
# Set options for Siphon's HTTP session manager--in this case user/password
session_manager.set_session_options(auth=('MYUSER', 'MYPASSWORD'))
cat = TDSCatalog('https://rda.ucar.edu/thredds/catalog/files/g/ds084.1/2020/20200101/catalog.xml')
selected_dataset = cat.datasets[0]
ds = selected_dataset.remote_access(service='CDMRemote', use_xarray=True)
我想用 xarray 打开一个 opendap url。它需要授权,因为它在 UCAR RDA 持有:
https://rda.ucar.edu/datasets/ds084.1/#!description
一个文件的url如下'https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2020/20200101/gfs.0p25.2020010100.f000.grib2'
我不确定我是否可以将授权作为 backend_kwarg?
下面的代码会给出错误信息
import xarray as xr
url = "https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2020/20200101/gfs.0p25.2020010100.f000.grib2"
ds = xr.open_dataset(url)
Traceback (most recent call last):
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/file_manager.py", line 199, in _acquire_with_cache_info
file = self._cache[self._key]
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/lru_cache.py", line 53, in __getitem__
value = self._cache[key]
KeyError: [<class 'netCDF4._netCDF4.Dataset'>, ('https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2020/20200101/gfs.0p25.2020010100.f000.grib2',), 'r', (('clobber', True), ('diskless', False), ('format', 'NETCDF4'), ('persist', False))]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/api.py", line 572, in open_dataset
store = opener(filename_or_obj, **extra_kwargs, **backend_kwargs)
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py", line 364, in open
return cls(manager, group=group, mode=mode, lock=lock, autoclose=autoclose)
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py", line 314, in __init__
self.format = self.ds.data_model
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py", line 373, in ds
return self._acquire()
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/netCDF4_.py", line 367, in _acquire
with self._manager.acquire_context(needs_lock) as root:
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/contextlib.py", line 113, in __enter__
return next(self.gen)
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/file_manager.py", line 187, in acquire_context
file, cached = self._acquire_with_cache_info(needs_lock)
File "/Users/ray.bell/miniconda/envs/test_env/lib/python3.8/site-packages/xarray/backends/file_manager.py", line 205, in _acquire_with_cache_info
file = self._opener(*self._args, **kwargs)
File "netCDF4/_netCDF4.pyx", line 2357, in netCDF4._netCDF4.Dataset.__init__
File "netCDF4/_netCDF4.pyx", line 1925, in netCDF4._netCDF4._ensure_nc_success
OSError: [Errno -78] NetCDF: Authorization failure: b'https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2020/20200101/gfs.0p25.2020010100.f000.grib2'
Siphon 的 session_manager 可能暗示了 auth 的样子 https://unidata.github.io/siphon/latest/examples/Basic_Usage.html#sphx-glr-examples-basic-usage-py / https://github.com/Unidata/siphon/blob/master/siphon/http_util.py#L52
感谢 Ryan May 指点我 https://publicwiki.deltares.nl/display/OET/Accessing+netCDF+data+via+OPeNDAP+on+password+protected+servers
在我的主目录中创建点文件允许我阅读 url。可能不是最干净的,我想可能会导致 VM's/cluster 出现问题,但可以正常工作。仍然希望 backend_kwargs
方法。
在您的主目录中创建一个文件 .netrc,如下所示:
machine rda.ucar.edu
login USR
password PWD
在您的主目录中有一个 .dodsrc 文件,如下所示:
HTTP.COOKIEJAR=<HOME_DIR>/.cookies
HTTP.NETRC=<HOME_DIR>/.netrc
您现在可以传递需要身份验证的 urls:
import xarray as xr
url = "https://rda.ucar.edu/thredds/dodsC/files/g/ds084.1/2020/20200101/gfs.0p25.2020010100.f000.grib2"
ds = xr.open_dataset(url)
如果您不是绝对需要使用 OPeNDAP,而只是想要某种与 xarray 接口的类似 OPeNDAP 的东西,您可以改用 THREDDS 的 CDMRemote 协议。在这种情况下,我们可以通过 requests
:
from siphon.catalog import TDSCatalog
from siphon.http_util import session_manager
# Set options for Siphon's HTTP session manager--in this case user/password
session_manager.set_session_options(auth=('MYUSER', 'MYPASSWORD'))
cat = TDSCatalog('https://rda.ucar.edu/thredds/catalog/files/g/ds084.1/2020/20200101/catalog.xml')
selected_dataset = cat.datasets[0]
ds = selected_dataset.remote_access(service='CDMRemote', use_xarray=True)