具有两个组共有维度的几个 hdf5 的串联
Concatenation of several hdf5 having a dimension common to two groups
我想连接几个 hdf5 文件。
这是 panoply 给出的 header :
我想在 npixels 维度上进行串联。
但是,如果我在 npixels 上执行 ncrcat,它会告诉我 'variable unknown'。
实际上,如果我执行 ncdump -c
,我看不到 npixels
维度,但 Data_Fields
组中的 phony_dim_0 和 Geolocation_Fields
中的 phony_dim_4 , 每个都有 655 像素。
我将这些维度设置为无限:
ncks --mk_rec_dmn phony_dim_0 ${file} ${file}
ncks -O --mk_rec_dmn phony_dim_4 ${file} ${file}
如果我这样做:
ncrcat Valid_CO_SOFRID-v4.0_200???.he5 Valid_CO_SOFRID-v4.0_200801-200907.he5 -v Latitude,Longitude,Day,Hour,Minute,"CO Total Column"
(只有一维变量),它似乎适用于 Geolocation_Fields
变量。对于 Data_Fields
变量,我得到了预期的元素数量,但具有相同的值(可能是平均值)。
如果我只保留 1 个变量,则输出相同:
ncrcat -d phony_dim_0,0, Valid_CO_SOFRID-v4.0_200???.he5 Valid_CO_SOFRID-v4.0_200801-200907_dim0.he5 -v "CO Total Column"
实际上我还需要一个额外的二维变量,但它不起作用:
ERROR: nco_put_vara() failed to nc_put_vara() variable "CO"
nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered error): nco_put_vara()
nco_err_exit(): ERROR Error code is -40. Translation into English with nc_strerror(-40) is "NetCDF: Index exceeds dimension bound"
谢谢
我用h5py成功了\o/
import h5py
import numpy as np
import glob
import collections
import os
def nested_dict():
return collections.defaultdict(nested_dict)
path = '/home/loip/Documents/tech/sofrid/data/windhoekSameDayOnly/'
listOfAllFiles = glob.glob( path + 'Valid_CO_SOFRID-v2.2_200???.he5' )
os.chdir( path )
dico = nested_dict()
with h5py.File('Valid_CO_SOFRID-v2.2_200801-200907.he5', 'w') as h5w:
for iFile, eachFile in enumerate(listOfAllFiles):
with h5py.File(eachFile, 'r') as h5r:
for group in ['HDFEOS']:
for subgroup1 in ['SWATHS']:
for subgroup2 in ['CO']:
for subgroup3 in h5r[group][subgroup1][subgroup2].keys(): #Data Fields & Geolocation Fields
for varName in h5r[group][subgroup1][subgroup2][subgroup3].keys():
dataThisTime = h5r[group][subgroup1][subgroup2][subgroup3][varName][:]
if iFile == 0:
dico[group][subgroup1][subgroup2][subgroup3][varName] = dataThisTime
else:
if dico[group][subgroup1][subgroup2][subgroup3][varName].ndim == 2:
dico[group][subgroup1][subgroup2][subgroup3][varName] = np.append( dico[group][subgroup1][subgroup2][subgroup3][varName], dataThisTime, axis=1 )
else:
dico[group][subgroup1][subgroup2][subgroup3][varName] = np.append( dico[group][subgroup1][subgroup2][subgroup3][varName], dataThisTime, axis=0 )
if iFile == len(listOfAllFiles)-1:
h5w.create_dataset(f'{group}/{subgroup1}/{subgroup2}/{subgroup3}/{varName}',
data=dico[group][subgroup1][subgroup2][subgroup3][varName])
我想连接几个 hdf5 文件。 这是 panoply 给出的 header :
我想在 npixels 维度上进行串联。
但是,如果我在 npixels 上执行 ncrcat,它会告诉我 'variable unknown'。
实际上,如果我执行 ncdump -c
,我看不到 npixels
维度,但 Data_Fields
组中的 phony_dim_0 和 Geolocation_Fields
中的 phony_dim_4 , 每个都有 655 像素。
我将这些维度设置为无限:
ncks --mk_rec_dmn phony_dim_0 ${file} ${file}
ncks -O --mk_rec_dmn phony_dim_4 ${file} ${file}
如果我这样做:
ncrcat Valid_CO_SOFRID-v4.0_200???.he5 Valid_CO_SOFRID-v4.0_200801-200907.he5 -v Latitude,Longitude,Day,Hour,Minute,"CO Total Column"
(只有一维变量),它似乎适用于 Geolocation_Fields
变量。对于 Data_Fields
变量,我得到了预期的元素数量,但具有相同的值(可能是平均值)。
如果我只保留 1 个变量,则输出相同:
ncrcat -d phony_dim_0,0, Valid_CO_SOFRID-v4.0_200???.he5 Valid_CO_SOFRID-v4.0_200801-200907_dim0.he5 -v "CO Total Column"
实际上我还需要一个额外的二维变量,但它不起作用:
ERROR: nco_put_vara() failed to nc_put_vara() variable "CO"
nco_err_exit(): ERROR Short NCO-generated message (usually name of function that triggered error): nco_put_vara()
nco_err_exit(): ERROR Error code is -40. Translation into English with nc_strerror(-40) is "NetCDF: Index exceeds dimension bound"
谢谢
我用h5py成功了\o/
import h5py
import numpy as np
import glob
import collections
import os
def nested_dict():
return collections.defaultdict(nested_dict)
path = '/home/loip/Documents/tech/sofrid/data/windhoekSameDayOnly/'
listOfAllFiles = glob.glob( path + 'Valid_CO_SOFRID-v2.2_200???.he5' )
os.chdir( path )
dico = nested_dict()
with h5py.File('Valid_CO_SOFRID-v2.2_200801-200907.he5', 'w') as h5w:
for iFile, eachFile in enumerate(listOfAllFiles):
with h5py.File(eachFile, 'r') as h5r:
for group in ['HDFEOS']:
for subgroup1 in ['SWATHS']:
for subgroup2 in ['CO']:
for subgroup3 in h5r[group][subgroup1][subgroup2].keys(): #Data Fields & Geolocation Fields
for varName in h5r[group][subgroup1][subgroup2][subgroup3].keys():
dataThisTime = h5r[group][subgroup1][subgroup2][subgroup3][varName][:]
if iFile == 0:
dico[group][subgroup1][subgroup2][subgroup3][varName] = dataThisTime
else:
if dico[group][subgroup1][subgroup2][subgroup3][varName].ndim == 2:
dico[group][subgroup1][subgroup2][subgroup3][varName] = np.append( dico[group][subgroup1][subgroup2][subgroup3][varName], dataThisTime, axis=1 )
else:
dico[group][subgroup1][subgroup2][subgroup3][varName] = np.append( dico[group][subgroup1][subgroup2][subgroup3][varName], dataThisTime, axis=0 )
if iFile == len(listOfAllFiles)-1:
h5w.create_dataset(f'{group}/{subgroup1}/{subgroup2}/{subgroup3}/{varName}',
data=dico[group][subgroup1][subgroup2][subgroup3][varName])