尝试使用 matlab netcdf 包缩小 netcdf 文件后,它的大小变大了
Size of netcdf file is larger after attempt to reduce it with matlab netcdf package
我尝试通过将其变量的时间分辨率除以二来减小 netcdf 文件的大小,使用以下方法:
infilename = 'original_file.nc4';
outfilename = 'new_file.nc4';
%% CREATE OUTPUT NETCDF FILE
ncid_out = netcdf.create(outfilename,'NETCDF4');
%% OPEN THE INPUT NETCDF FILE
ncid_in = netcdf.open(infilename,'NOWRITE'); % open original file in read-only mode
[ndims,nvars] = netcdf.inq(ncid_in);
%% DEFINE NEW DIMENSIONS
for d = 0 : ndims-1
[dimname,dimlen] = netcdf.inqDim(ncid_in,d); % get dimension from input file
if strcmp(dimname,'time')
netcdf.defDim(ncid_out,dimname,dimlen/2); % new time dimension with half the resolution
else netcdf.defDim(ncid_out,dimname,dimlen); % other dimensions remain unchanged
end
end
%% DEFINE NEW VARIABLES AND ATTRIBUTES
for v = 0 : nvars-1
[varname,~,dimids] = netcdf.inqVar(ncid_in,v);
out_varid = netcdf.defVar(ncid_out,varname,xtype,dimids);
for attnum = 0 : natts-1
attname = netcdf.inqAttName(ncid_in,v,attnum);
netcdf.copyAtt(ncid_in,v,attname,ncid_out,out_varid);
end
end
%% LEAVE DEFINE MODE AND ENTER DATA MODE
netcdf.endDef(ncid_out);
for v = 0 : nvars-1
[varname,xtype,dimids,natts] = netcdf.inqVar(ncid_in,v);
var = netcdf.getVar(ncid_in,v);
out_varid = netcdf.inqVarID(ncid_out,varname);
if ~isempty(find(dimids==netcdf.inqDimID(ncid_in,'time'),1)) % if time is one of the dimensions
indt = knnsearch(dimids',netcdf.inqDimID(ncid_in,'time')); % find which one it is
S = cell(1,length(dimids));
for f = dimids
[~,dimlen] = netcdf.inqDim(ncid_in,f); % length of the dimension
if netcdf.inqDimID(ncid_in,netcdf.inqDim(ncid_out,f)) == dimids(indt) % if this dimension is time
S{indt} = 1:2:dimlen; % reduce this dimension
else S{knnsearch(dimids',netcdf.inqDimID(ncid_in,netcdf.inqDim(ncid_in,f)))} = 1:dimlen;
end
end
netcdf.putVar(ncid_out,out_varid,var(S{1:end})); % assign reduced variable
else netcdf.putVar(ncid_out,out_varid,var); % assign full variable
end
end
%% CLOSE INPUT AND AND OUTPUT NETCDF FILES
netcdf.close(ncid_in);
netcdf.close(ncid_out);
代码运行没有错误,新文件确实包含时间维度为原始文件一半的变量。
原文件大小为1.1 Go,新文件大小为1.4 Go。因为我将时间分辨率减半,所以我希望生成一个大小约为原始文件一半的文件。我不确定这是怎么发生的。
你能解释一下吗?
NetCDF4 文件可以使用通缩(无损压缩)来减小大小。您的原始文件可能是用压缩写的,而您写的新文件不是。您需要使用 netcdf.defVarDeflate
:
指定通缩
netcdf.defVarDeflate(ncid,varid,shuffle,deflate,deflateLevel)
因此,请尝试在 defVar
调用之后添加此行,这将为您提供 7 级的紧缩级别,并启用随机播放:
out_varid = netcdf.defVar(ncid_out,varname,xtype,dimids);
netcdf.defVarDeflate(ncid_out,out_varid, true, true, 7);
有关详细信息,请参阅:
https://www.mathworks.com/help/matlab/ref/netcdf.defvardeflate.html?requestedDomain=www.mathworks.com
我尝试通过将其变量的时间分辨率除以二来减小 netcdf 文件的大小,使用以下方法:
infilename = 'original_file.nc4';
outfilename = 'new_file.nc4';
%% CREATE OUTPUT NETCDF FILE
ncid_out = netcdf.create(outfilename,'NETCDF4');
%% OPEN THE INPUT NETCDF FILE
ncid_in = netcdf.open(infilename,'NOWRITE'); % open original file in read-only mode
[ndims,nvars] = netcdf.inq(ncid_in);
%% DEFINE NEW DIMENSIONS
for d = 0 : ndims-1
[dimname,dimlen] = netcdf.inqDim(ncid_in,d); % get dimension from input file
if strcmp(dimname,'time')
netcdf.defDim(ncid_out,dimname,dimlen/2); % new time dimension with half the resolution
else netcdf.defDim(ncid_out,dimname,dimlen); % other dimensions remain unchanged
end
end
%% DEFINE NEW VARIABLES AND ATTRIBUTES
for v = 0 : nvars-1
[varname,~,dimids] = netcdf.inqVar(ncid_in,v);
out_varid = netcdf.defVar(ncid_out,varname,xtype,dimids);
for attnum = 0 : natts-1
attname = netcdf.inqAttName(ncid_in,v,attnum);
netcdf.copyAtt(ncid_in,v,attname,ncid_out,out_varid);
end
end
%% LEAVE DEFINE MODE AND ENTER DATA MODE
netcdf.endDef(ncid_out);
for v = 0 : nvars-1
[varname,xtype,dimids,natts] = netcdf.inqVar(ncid_in,v);
var = netcdf.getVar(ncid_in,v);
out_varid = netcdf.inqVarID(ncid_out,varname);
if ~isempty(find(dimids==netcdf.inqDimID(ncid_in,'time'),1)) % if time is one of the dimensions
indt = knnsearch(dimids',netcdf.inqDimID(ncid_in,'time')); % find which one it is
S = cell(1,length(dimids));
for f = dimids
[~,dimlen] = netcdf.inqDim(ncid_in,f); % length of the dimension
if netcdf.inqDimID(ncid_in,netcdf.inqDim(ncid_out,f)) == dimids(indt) % if this dimension is time
S{indt} = 1:2:dimlen; % reduce this dimension
else S{knnsearch(dimids',netcdf.inqDimID(ncid_in,netcdf.inqDim(ncid_in,f)))} = 1:dimlen;
end
end
netcdf.putVar(ncid_out,out_varid,var(S{1:end})); % assign reduced variable
else netcdf.putVar(ncid_out,out_varid,var); % assign full variable
end
end
%% CLOSE INPUT AND AND OUTPUT NETCDF FILES
netcdf.close(ncid_in);
netcdf.close(ncid_out);
代码运行没有错误,新文件确实包含时间维度为原始文件一半的变量。
原文件大小为1.1 Go,新文件大小为1.4 Go。因为我将时间分辨率减半,所以我希望生成一个大小约为原始文件一半的文件。我不确定这是怎么发生的。
你能解释一下吗?
NetCDF4 文件可以使用通缩(无损压缩)来减小大小。您的原始文件可能是用压缩写的,而您写的新文件不是。您需要使用 netcdf.defVarDeflate
:
netcdf.defVarDeflate(ncid,varid,shuffle,deflate,deflateLevel)
因此,请尝试在 defVar
调用之后添加此行,这将为您提供 7 级的紧缩级别,并启用随机播放:
out_varid = netcdf.defVar(ncid_out,varname,xtype,dimids);
netcdf.defVarDeflate(ncid_out,out_varid, true, true, 7);
有关详细信息,请参阅: https://www.mathworks.com/help/matlab/ref/netcdf.defvardeflate.html?requestedDomain=www.mathworks.com