合并大量 netCDF 文件
Combining a large amount of netCDF files
我有一个很大的 netCDF (.nc) 文件文件夹,每个文件的名称都相似。数据文件包含时间、经度、纬度和月降水量变量。目标是获得 X 年中每个月的平均月降水量。所以最后我会有 12 个值代表每个纬度和经度 X 年的平均月降水量。多年来,每个文件都位于同一位置。
每个文件都以相同的名称开头并以“date.sub.nc”结尾,例如:
'data1.somthing.somthing1.avg_2d_Ind_Nx.200109.SUB.nc'
'data1.somthing.somthing1.avg_2d_Ind_Nx.200509.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201104.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201004.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201003.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201103.SUB.nc'
'data1.somthing.somthing1.avg_2d_Ind_Nx.201203.SUB.nc'
结尾是YearMonth.SUB.nc
我到目前为止是:
array=[]
f = nc.MFDataset('data*.nc')
precp = f.variables['prectot']
time = f.variables['time']
array = f.variables['time','longitude','latitude','prectot']
我得到一个 KeyError:('time'、'longitude'、'latitude'、'prectot')。有没有一种方法可以组合所有这些数据以便我能够对其进行操作?
NCO 使用
ncra *.01.SUB.nc pcp_avg_01.nc
ncra *.02.SUB.nc pcp_avg_02.nc
...
ncra *.12.SUB.nc pcp_avg_12.nc
ncrcat pcp_avg_??.nc pcp_avg.nc
当然,前十二个命令可以用Bash循环完成,将总行数减少到五行以下。如果您更喜欢使用 python 编写脚本,您可以使用此检查您的答案。 ncra 文档 here.
正如@CharlieZender 所提到的,ncra
是前往此处的方式,我将提供有关将该功能集成到 Python 脚本中的更多详细信息。 (PS - 您可以使用 Homebrew 轻松安装 NCO,例如 http://alejandrosoto.net/blog/2014/01/22/setting-up-my-mac-for-scientific-research/)
import subprocess
import netCDF4
import glob
import numpy as np
for month in range(1,13):
# Gather all the files for this month
month_files = glob.glob('/path/to/files/*{0:0>2d}.SUB.nc'.format(month))
# Using NCO functions ---------------
avg_file = './precip_avg_{0:0>2d}.nc'.format(month)
# Concatenate the files using ncrcat
subprocess.call(['ncrcat'] + month_files + ['-O', avg_file])
# Take the time (record) average using ncra
subprocess.call(['ncra', avg_file, '-O', avg_file])
# Read in the monthly precip climatology file and do whatever now
ncfile = netCDF4.Dataset(avg_file, 'r')
pr = ncfile.variables['prectot'][:,:,:]
....
# Using only Python -------------
# Initialize an array to store monthly-mean precip for all years
# let's presume we know the lat and lon dimensions (nlat, nlon)
nyears = len(month_files)
pr_arr = np.zeros([nyears,nlat,nlon], dtype='f4')
# Populate pr_arr with each file's monthly-mean precip
for idx, filename in enumerate(month_files):
ncfile = netCDF4.Dataset(filename, 'r')
pr = ncfile.variable['prectot'][:,:,:]
pr_arr[idx,:,:] = np.mean(pr, axis=0)
ncfile.close()
# Take the average along all years for a monthly climatology
pr_clim = np.mean(pr_arr, axis=0) # 2D now [lat,lon]
命令ymonmean计算CDO中日历月的平均值。因此,任务可以分两行完成:
cdo mergetime data*.SUB.nc merged.nc # put files together into one series
cdo ymonmean merged.nc annual_cycle.nc # mean of all Jan,Feb etc.
cdo还可以计算其他统计的年周期,ymonstd,ymonmax等...时间单位可以是天或五元,也可以是月。 (例如 ydaymean)。
我有一个很大的 netCDF (.nc) 文件文件夹,每个文件的名称都相似。数据文件包含时间、经度、纬度和月降水量变量。目标是获得 X 年中每个月的平均月降水量。所以最后我会有 12 个值代表每个纬度和经度 X 年的平均月降水量。多年来,每个文件都位于同一位置。 每个文件都以相同的名称开头并以“date.sub.nc”结尾,例如:
'data1.somthing.somthing1.avg_2d_Ind_Nx.200109.SUB.nc'
'data1.somthing.somthing1.avg_2d_Ind_Nx.200509.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201104.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201004.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201003.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201103.SUB.nc'
'data1.somthing.somthing1.avg_2d_Ind_Nx.201203.SUB.nc'
结尾是YearMonth.SUB.nc 我到目前为止是:
array=[]
f = nc.MFDataset('data*.nc')
precp = f.variables['prectot']
time = f.variables['time']
array = f.variables['time','longitude','latitude','prectot']
我得到一个 KeyError:('time'、'longitude'、'latitude'、'prectot')。有没有一种方法可以组合所有这些数据以便我能够对其进行操作?
NCO 使用
ncra *.01.SUB.nc pcp_avg_01.nc
ncra *.02.SUB.nc pcp_avg_02.nc
...
ncra *.12.SUB.nc pcp_avg_12.nc
ncrcat pcp_avg_??.nc pcp_avg.nc
当然,前十二个命令可以用Bash循环完成,将总行数减少到五行以下。如果您更喜欢使用 python 编写脚本,您可以使用此检查您的答案。 ncra 文档 here.
正如@CharlieZender 所提到的,ncra
是前往此处的方式,我将提供有关将该功能集成到 Python 脚本中的更多详细信息。 (PS - 您可以使用 Homebrew 轻松安装 NCO,例如 http://alejandrosoto.net/blog/2014/01/22/setting-up-my-mac-for-scientific-research/)
import subprocess
import netCDF4
import glob
import numpy as np
for month in range(1,13):
# Gather all the files for this month
month_files = glob.glob('/path/to/files/*{0:0>2d}.SUB.nc'.format(month))
# Using NCO functions ---------------
avg_file = './precip_avg_{0:0>2d}.nc'.format(month)
# Concatenate the files using ncrcat
subprocess.call(['ncrcat'] + month_files + ['-O', avg_file])
# Take the time (record) average using ncra
subprocess.call(['ncra', avg_file, '-O', avg_file])
# Read in the monthly precip climatology file and do whatever now
ncfile = netCDF4.Dataset(avg_file, 'r')
pr = ncfile.variables['prectot'][:,:,:]
....
# Using only Python -------------
# Initialize an array to store monthly-mean precip for all years
# let's presume we know the lat and lon dimensions (nlat, nlon)
nyears = len(month_files)
pr_arr = np.zeros([nyears,nlat,nlon], dtype='f4')
# Populate pr_arr with each file's monthly-mean precip
for idx, filename in enumerate(month_files):
ncfile = netCDF4.Dataset(filename, 'r')
pr = ncfile.variable['prectot'][:,:,:]
pr_arr[idx,:,:] = np.mean(pr, axis=0)
ncfile.close()
# Take the average along all years for a monthly climatology
pr_clim = np.mean(pr_arr, axis=0) # 2D now [lat,lon]
命令ymonmean计算CDO中日历月的平均值。因此,任务可以分两行完成:
cdo mergetime data*.SUB.nc merged.nc # put files together into one series
cdo ymonmean merged.nc annual_cycle.nc # mean of all Jan,Feb etc.
cdo还可以计算其他统计的年周期,ymonstd,ymonmax等...时间单位可以是天或五元,也可以是月。 (例如 ydaymean)。