从服务器拉取多个 NetCDF 文件、索引、循环和保存文件的最佳功能?

Best function for pulling multiple NetCDF files from server, indexing, looping, and saving on file?

对编程还很陌生。我正在尝试更改一个脚本,该脚本用于提取包含数据的 .txt 文件,现在可以从 HTTP 服务器提取 NetCDF 文件,下载、重命名并保存在本地(以及另一个服务器位置)。我已经粘贴了基本代码,包括 NetCDF 文件的实际浮标数据文件名。我相信 urlrequest 行有问题。我试过 urllib.request.openurl.request.retrieve 都出错了。

    import os
    import urllib
    import urllib.request
    import shutil
    import netCDF4
    import requests
           
    # Weblink for location of spectra and wave data
    webSpectra = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/swden/41004/41004w9999.nc'
    
    webWave = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc'
       
    #set save location for each
    saveloc = 'saveSpectra41004w9999.nc'
    saveloc2 = 'saveWave41004h9999.nc'
    
    # perform pull
    try:
            urllib.request.urlopen(webSpectra, saveloc)
        except urllib.error.HTTPError as exception:
            print('Station: 41004 spectra file not available')
            print(exception)
        
        try:     
            urllib.request.urlopen(webWave, saveloc2)    
        except urllib.error.HTTPError as exception:
            print('Station: 41004 wave file not available')
            print(exception)
        print ('Pulling data for 41004)
        print('Percent complete '+ str(round(100*(count/len(stationIndex)))))

    print('Done')

我的错误

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-13-5e5ebd26fe46> in <module>
     59     # perform pull
     60     try:
---> 61         urllib.request.urlopen(webSpectra, saveloc)
     62     except urllib.error.HTTPError as exception:
     63         print('Station: 41004 spectra file not available')

/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
    221     else:
    222         opener = _opener
--> 223     return opener.open(url, data, timeout)
    224 
    225 def install_opener(opener):

/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
    522         for processor in self.process_request.get(protocol, []):
    523             meth = getattr(processor, meth_name)
--> 524             req = meth(req)
    525 
    526         response = self._open(req, data)

/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in do_request_(self, request)
   1277                 msg = "POST data should be bytes, an iterable of bytes, " \
   1278                       "or a file object. It cannot be of type str."
-> 1279                 raise TypeError(msg)
   1280             if not request.has_header('Content-type'):
   1281                 request.add_unredirected_header(

TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.

...and save locally.

根据我的理解,您应该 open() 本地文件,而不是发布到 URL。

您只是想通过它的外观下载文件。您可以使用 nctoolkit (https://nctoolkit.readthedocs.io/en/latest/) 执行此操作。这会将文件下载到临时位置。然后您可以导出到 xarray 或 pandas 等,或者只保存文件。

以下代码适用于一个文件:

import nctoolkit as nc
ds = nc.open_url('https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc')
# convert to xarray dataset
ds_xr = ds.to_xarray()
# convert to pandas dataframe
df = ds.to_dataframe()
# save to location
ds.to_nc("outfile.nc")

如果由于依赖问题等原因上述方法不起作用,您可以使用 urllib:

import urllib.request
url = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc'
urllib.request.urlretrieve(url, '/tmp/temp/nc')