使用 python 从 netCDF 读取时间序列
Reading Time Series from netCDF with python
我正在尝试使用 python 从 netCDF 文件(通过 Thredds 服务器访问)创建时间序列。我使用的代码似乎是正确的,但变量 amb 读数的值是 'masked'。我是 python 的新手,不熟悉这些格式。知道如何读取数据吗?
这是我使用的代码:
import netCDF4
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
from datetime import datetime, timedelta #
dayFile = datetime.now() - timedelta(days=1)
dayFile = dayFile.strftime("%Y%m%d")
url='http://nomads.ncep.noaa.gov:9090/dods/nam/nam%s/nam1hr_00z' %(dayFile)
# NetCDF4-Python can open OPeNDAP dataset just like a local NetCDF file
nc = netCDF4.Dataset(url)
varsInFile = nc.variables.keys()
lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)
first = netCDF4.num2date(time_var[0],time_var.units)
last = netCDF4.num2date(time_var[-1],time_var.units)
print first.strftime('%Y-%b-%d %H:%M')
print last.strftime('%Y-%b-%d %H:%M')
# determine what longitude convention is being used
print lon.min(),lon.max()
# Specify desired station time series location
# note we add 360 because of the lon convention in this dataset
#lati = 36.605; loni = -121.85899 + 360. # west of Pacific Grove, CA
lati = 41.4; loni = -100.8 +360.0 # Georges Bank
# Function to find index to nearest point
def near(array,value):
idx=(abs(array-value)).argmin()
return idx
# Find nearest point to desired location (no interpolation)
ix = near(lon, loni)
iy = near(lat, lati)
print ix,iy
# Extract desired times.
# 1. Select -+some days around the current time:
start = netCDF4.num2date(time_var[0],time_var.units)
stop = netCDF4.num2date(time_var[-1],time_var.units)
time_var = nc.variables['time']
datetime = netCDF4.num2date(time_var[:],time_var.units)
istart = netCDF4.date2index(start,time_var,select='nearest')
istop = netCDF4.date2index(stop,time_var,select='nearest')
print istart,istop
# Get all time records of variable [vname] at indices [iy,ix]
vname = 'dswrfsfc'
var = nc.variables[vname]
hs = var[istart:istop,iy,ix]
tim = dtime[istart:istop]
# Create Pandas time series object
ts = pd.Series(hs,index=tim,name=vname)
var 数据没有像我预期的那样读取,显然是因为数据被屏蔽了:
>>> hs
masked_array(data = [-- -- -- ..., -- -- --],
mask = [ True True True ..., True True True],
fill_value = 9.999e+20)
var 名称和时间序列以及脚本的其余部分都是正确的。唯一不起作用的是检索到的 var 数据。这是我得到的时间序列:
>>> ts
2016-10-25 00:00:00.000000 NaN
2016-10-25 01:00:00.000000 NaN
2016-10-25 02:00:00.000006 NaN
2016-10-25 03:00:00.000000 NaN
2016-10-25 04:00:00.000000 NaN
... ... ... ... ...
2016-10-26 10:00:00.000000 NaN
2016-10-26 11:00:00.000006 NaN
Name: dswrfsfc, dtype: float32
任何帮助将不胜感激!
嗯,这段代码看起来很眼熟。 ;-)
您收到 NaN 是因为您尝试访问的 NAM 模型现在使用 [-180, 180]
范围内的经度,而不是 [0, 360]
范围内的经度。因此,如果您请求 loni = -100.8
而不是 loni = -100.8 +360.0
,我相信您的代码将 return 非 NaN 值。
然而,值得注意的是,使用 xarray 从多维网格数据中提取时间序列的任务现在要容易得多,因为您可以简单地 select 最接近 lon,lat 的数据集点,然后绘制任何变量。数据仅在您需要时加载,而不是在您提取数据集对象时加载。所以基本上你现在只需要:
import xarray as xr
ds = xr.open_dataset(url) # NetCDF or OPeNDAP URL
lati = 41.4; loni = -100.8 # Georges Bank
# Extract a dataset closest to specified point
dsloc = ds.sel(lon=loni, lat=lati, method='nearest')
# select a variable to plot
dsloc['dswrfsfc'].plot()
这里是完整的笔记本:http://nbviewer.jupyter.org/gist/rsignell-usgs/d55b37c6253f27c53ef0731b610b81b4
我用 xarray 检查了你的方法。非常适合提取太阳辐射数据!我可以补充一点,第一个点未定义 (NaN),因为模型从那里开始计算,所以没有累积辐射数据(计算每小时的全球辐射)。所以这就是它被屏蔽的原因。
大家忽略的一点是输出不正确。它看起来确实不错(中午 = 阳光,午夜 = 0,黑暗),但日长不正确!我检查了北纬 52 度和东经 5.6 度(11 月),日照时间至少超过 2 小时! (用于 Netcdf 数据库的 NOAA Panoply 查看器给出了类似的结果)
我正在尝试使用 python 从 netCDF 文件(通过 Thredds 服务器访问)创建时间序列。我使用的代码似乎是正确的,但变量 amb 读数的值是 'masked'。我是 python 的新手,不熟悉这些格式。知道如何读取数据吗?
这是我使用的代码:
import netCDF4
import pandas as pd
import datetime as dt
import matplotlib.pyplot as plt
from datetime import datetime, timedelta #
dayFile = datetime.now() - timedelta(days=1)
dayFile = dayFile.strftime("%Y%m%d")
url='http://nomads.ncep.noaa.gov:9090/dods/nam/nam%s/nam1hr_00z' %(dayFile)
# NetCDF4-Python can open OPeNDAP dataset just like a local NetCDF file
nc = netCDF4.Dataset(url)
varsInFile = nc.variables.keys()
lat = nc.variables['lat'][:]
lon = nc.variables['lon'][:]
time_var = nc.variables['time']
dtime = netCDF4.num2date(time_var[:],time_var.units)
first = netCDF4.num2date(time_var[0],time_var.units)
last = netCDF4.num2date(time_var[-1],time_var.units)
print first.strftime('%Y-%b-%d %H:%M')
print last.strftime('%Y-%b-%d %H:%M')
# determine what longitude convention is being used
print lon.min(),lon.max()
# Specify desired station time series location
# note we add 360 because of the lon convention in this dataset
#lati = 36.605; loni = -121.85899 + 360. # west of Pacific Grove, CA
lati = 41.4; loni = -100.8 +360.0 # Georges Bank
# Function to find index to nearest point
def near(array,value):
idx=(abs(array-value)).argmin()
return idx
# Find nearest point to desired location (no interpolation)
ix = near(lon, loni)
iy = near(lat, lati)
print ix,iy
# Extract desired times.
# 1. Select -+some days around the current time:
start = netCDF4.num2date(time_var[0],time_var.units)
stop = netCDF4.num2date(time_var[-1],time_var.units)
time_var = nc.variables['time']
datetime = netCDF4.num2date(time_var[:],time_var.units)
istart = netCDF4.date2index(start,time_var,select='nearest')
istop = netCDF4.date2index(stop,time_var,select='nearest')
print istart,istop
# Get all time records of variable [vname] at indices [iy,ix]
vname = 'dswrfsfc'
var = nc.variables[vname]
hs = var[istart:istop,iy,ix]
tim = dtime[istart:istop]
# Create Pandas time series object
ts = pd.Series(hs,index=tim,name=vname)
var 数据没有像我预期的那样读取,显然是因为数据被屏蔽了:
>>> hs
masked_array(data = [-- -- -- ..., -- -- --],
mask = [ True True True ..., True True True],
fill_value = 9.999e+20)
var 名称和时间序列以及脚本的其余部分都是正确的。唯一不起作用的是检索到的 var 数据。这是我得到的时间序列:
>>> ts
2016-10-25 00:00:00.000000 NaN
2016-10-25 01:00:00.000000 NaN
2016-10-25 02:00:00.000006 NaN
2016-10-25 03:00:00.000000 NaN
2016-10-25 04:00:00.000000 NaN
... ... ... ... ...
2016-10-26 10:00:00.000000 NaN
2016-10-26 11:00:00.000006 NaN
Name: dswrfsfc, dtype: float32
任何帮助将不胜感激!
嗯,这段代码看起来很眼熟。 ;-)
您收到 NaN 是因为您尝试访问的 NAM 模型现在使用 [-180, 180]
范围内的经度,而不是 [0, 360]
范围内的经度。因此,如果您请求 loni = -100.8
而不是 loni = -100.8 +360.0
,我相信您的代码将 return 非 NaN 值。
然而,值得注意的是,使用 xarray 从多维网格数据中提取时间序列的任务现在要容易得多,因为您可以简单地 select 最接近 lon,lat 的数据集点,然后绘制任何变量。数据仅在您需要时加载,而不是在您提取数据集对象时加载。所以基本上你现在只需要:
import xarray as xr
ds = xr.open_dataset(url) # NetCDF or OPeNDAP URL
lati = 41.4; loni = -100.8 # Georges Bank
# Extract a dataset closest to specified point
dsloc = ds.sel(lon=loni, lat=lati, method='nearest')
# select a variable to plot
dsloc['dswrfsfc'].plot()
这里是完整的笔记本:http://nbviewer.jupyter.org/gist/rsignell-usgs/d55b37c6253f27c53ef0731b610b81b4
我用 xarray 检查了你的方法。非常适合提取太阳辐射数据!我可以补充一点,第一个点未定义 (NaN),因为模型从那里开始计算,所以没有累积辐射数据(计算每小时的全球辐射)。所以这就是它被屏蔽的原因。
大家忽略的一点是输出不正确。它看起来确实不错(中午 = 阳光,午夜 = 0,黑暗),但日长不正确!我检查了北纬 52 度和东经 5.6 度(11 月),日照时间至少超过 2 小时! (用于 Netcdf 数据库的 NOAA Panoply 查看器给出了类似的结果)