从服务器拉取多个 NetCDF 文件、索引、循环和保存文件的最佳功能?
Best function for pulling multiple NetCDF files from server, indexing, looping, and saving on file?
对编程还很陌生。我正在尝试更改一个脚本,该脚本用于提取包含数据的 .txt 文件,现在可以从 HTTP 服务器提取 NetCDF 文件,下载、重命名并保存在本地(以及另一个服务器位置)。我已经粘贴了基本代码,包括 NetCDF 文件的实际浮标数据文件名。我相信 urlrequest 行有问题。我试过 urllib.request.open
和 url.request.retrieve
都出错了。
import os
import urllib
import urllib.request
import shutil
import netCDF4
import requests
# Weblink for location of spectra and wave data
webSpectra = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/swden/41004/41004w9999.nc'
webWave = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc'
#set save location for each
saveloc = 'saveSpectra41004w9999.nc'
saveloc2 = 'saveWave41004h9999.nc'
# perform pull
try:
urllib.request.urlopen(webSpectra, saveloc)
except urllib.error.HTTPError as exception:
print('Station: 41004 spectra file not available')
print(exception)
try:
urllib.request.urlopen(webWave, saveloc2)
except urllib.error.HTTPError as exception:
print('Station: 41004 wave file not available')
print(exception)
print ('Pulling data for 41004)
print('Percent complete '+ str(round(100*(count/len(stationIndex)))))
print('Done')
我的错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-5e5ebd26fe46> in <module>
59 # perform pull
60 try:
---> 61 urllib.request.urlopen(webSpectra, saveloc)
62 except urllib.error.HTTPError as exception:
63 print('Station: 41004 spectra file not available')
/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
221 else:
222 opener = _opener
--> 223 return opener.open(url, data, timeout)
224
225 def install_opener(opener):
/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
522 for processor in self.process_request.get(protocol, []):
523 meth = getattr(processor, meth_name)
--> 524 req = meth(req)
525
526 response = self._open(req, data)
/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in do_request_(self, request)
1277 msg = "POST data should be bytes, an iterable of bytes, " \
1278 "or a file object. It cannot be of type str."
-> 1279 raise TypeError(msg)
1280 if not request.has_header('Content-type'):
1281 request.add_unredirected_header(
TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.
...and save locally.
根据我的理解,您应该 open()
本地文件,而不是发布到 URL。
您只是想通过它的外观下载文件。您可以使用 nctoolkit (https://nctoolkit.readthedocs.io/en/latest/) 执行此操作。这会将文件下载到临时位置。然后您可以导出到 xarray 或 pandas 等,或者只保存文件。
以下代码适用于一个文件:
import nctoolkit as nc
ds = nc.open_url('https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc')
# convert to xarray dataset
ds_xr = ds.to_xarray()
# convert to pandas dataframe
df = ds.to_dataframe()
# save to location
ds.to_nc("outfile.nc")
如果由于依赖问题等原因上述方法不起作用,您可以使用 urllib:
import urllib.request
url = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc'
urllib.request.urlretrieve(url, '/tmp/temp/nc')
对编程还很陌生。我正在尝试更改一个脚本,该脚本用于提取包含数据的 .txt 文件,现在可以从 HTTP 服务器提取 NetCDF 文件,下载、重命名并保存在本地(以及另一个服务器位置)。我已经粘贴了基本代码,包括 NetCDF 文件的实际浮标数据文件名。我相信 urlrequest 行有问题。我试过 urllib.request.open
和 url.request.retrieve
都出错了。
import os
import urllib
import urllib.request
import shutil
import netCDF4
import requests
# Weblink for location of spectra and wave data
webSpectra = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/swden/41004/41004w9999.nc'
webWave = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc'
#set save location for each
saveloc = 'saveSpectra41004w9999.nc'
saveloc2 = 'saveWave41004h9999.nc'
# perform pull
try:
urllib.request.urlopen(webSpectra, saveloc)
except urllib.error.HTTPError as exception:
print('Station: 41004 spectra file not available')
print(exception)
try:
urllib.request.urlopen(webWave, saveloc2)
except urllib.error.HTTPError as exception:
print('Station: 41004 wave file not available')
print(exception)
print ('Pulling data for 41004)
print('Percent complete '+ str(round(100*(count/len(stationIndex)))))
print('Done')
我的错误
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-13-5e5ebd26fe46> in <module>
59 # perform pull
60 try:
---> 61 urllib.request.urlopen(webSpectra, saveloc)
62 except urllib.error.HTTPError as exception:
63 print('Station: 41004 spectra file not available')
/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
221 else:
222 opener = _opener
--> 223 return opener.open(url, data, timeout)
224
225 def install_opener(opener):
/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in open(self, fullurl, data, timeout)
522 for processor in self.process_request.get(protocol, []):
523 meth = getattr(processor, meth_name)
--> 524 req = meth(req)
525
526 response = self._open(req, data)
/work/anaconda3/envs/aoes/lib/python3.6/urllib/request.py in do_request_(self, request)
1277 msg = "POST data should be bytes, an iterable of bytes, " \
1278 "or a file object. It cannot be of type str."
-> 1279 raise TypeError(msg)
1280 if not request.has_header('Content-type'):
1281 request.add_unredirected_header(
TypeError: POST data should be bytes, an iterable of bytes, or a file object. It cannot be of type str.
...and save locally.
根据我的理解,您应该 open()
本地文件,而不是发布到 URL。
您只是想通过它的外观下载文件。您可以使用 nctoolkit (https://nctoolkit.readthedocs.io/en/latest/) 执行此操作。这会将文件下载到临时位置。然后您可以导出到 xarray 或 pandas 等,或者只保存文件。
以下代码适用于一个文件:
import nctoolkit as nc
ds = nc.open_url('https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc')
# convert to xarray dataset
ds_xr = ds.to_xarray()
# convert to pandas dataframe
df = ds.to_dataframe()
# save to location
ds.to_nc("outfile.nc")
如果由于依赖问题等原因上述方法不起作用,您可以使用 urllib:
import urllib.request
url = 'https://dods.ndbc.noaa.gov/thredds/fileServer/data/stdmet/41004/41004h9999.nc'
urllib.request.urlretrieve(url, '/tmp/temp/nc')