尝试使用 matlab netcdf 包缩小 netcdf 文件后,它的大小变大了

Size of netcdf file is larger after attempt to reduce it with matlab netcdf package

我尝试通过将其变量的时间分辨率除以二来减小 netcdf 文件的大小,使用以下方法:

infilename = 'original_file.nc4';
outfilename = 'new_file.nc4';
%% CREATE OUTPUT NETCDF FILE
ncid_out = netcdf.create(outfilename,'NETCDF4');
%% OPEN THE INPUT NETCDF FILE
ncid_in  = netcdf.open(infilename,'NOWRITE'); % open original file in read-only mode
[ndims,nvars] = netcdf.inq(ncid_in);
%% DEFINE NEW DIMENSIONS
for d = 0 : ndims-1
    [dimname,dimlen] = netcdf.inqDim(ncid_in,d); % get dimension from input file
    if strcmp(dimname,'time')
        netcdf.defDim(ncid_out,dimname,dimlen/2); % new time dimension with half the resolution
    else netcdf.defDim(ncid_out,dimname,dimlen); % other dimensions remain unchanged
    end
end
%% DEFINE NEW VARIABLES AND ATTRIBUTES
for v = 0 : nvars-1
    [varname,~,dimids] = netcdf.inqVar(ncid_in,v);
    out_varid = netcdf.defVar(ncid_out,varname,xtype,dimids);
    for attnum = 0 : natts-1
        attname = netcdf.inqAttName(ncid_in,v,attnum);
        netcdf.copyAtt(ncid_in,v,attname,ncid_out,out_varid);
    end
end
%% LEAVE DEFINE MODE AND ENTER DATA MODE
netcdf.endDef(ncid_out);
for v = 0 : nvars-1
    [varname,xtype,dimids,natts] = netcdf.inqVar(ncid_in,v);
    var = netcdf.getVar(ncid_in,v);
    out_varid = netcdf.inqVarID(ncid_out,varname);
    if ~isempty(find(dimids==netcdf.inqDimID(ncid_in,'time'),1)) % if time is one of the dimensions
        indt = knnsearch(dimids',netcdf.inqDimID(ncid_in,'time')); % find which one it is
        S = cell(1,length(dimids));
        for f = dimids
            [~,dimlen] = netcdf.inqDim(ncid_in,f); % length of the dimension
            if netcdf.inqDimID(ncid_in,netcdf.inqDim(ncid_out,f)) == dimids(indt) % if this dimension is time
                S{indt} = 1:2:dimlen; % reduce this dimension
            else S{knnsearch(dimids',netcdf.inqDimID(ncid_in,netcdf.inqDim(ncid_in,f)))} = 1:dimlen;
            end
        end
        netcdf.putVar(ncid_out,out_varid,var(S{1:end})); % assign reduced variable
    else netcdf.putVar(ncid_out,out_varid,var); % assign full variable
    end
end
%% CLOSE INPUT AND AND OUTPUT NETCDF FILES
netcdf.close(ncid_in);
netcdf.close(ncid_out);

代码运行没有错误,新文件确实包含时间维度为原始文件一半的变量。

原文件大小为1.1 Go,新文件大小为1.4 Go。因为我将时间分辨率减半,所以我希望生成一个大小约为原始文件一半的文件。我不确定这是怎么发生的。

你能解释一下吗?

NetCDF4 文件可以使用通缩(无损压缩)来减小大小。您的原始文件可能是用压缩写的,而您写的新文件不是。您需要使用 netcdf.defVarDeflate:

指定通缩
netcdf.defVarDeflate(ncid,varid,shuffle,deflate,deflateLevel) 

因此,请尝试在 defVar 调用之后添加此行,这将为您提供 7 级的紧缩级别,并启用随机播放:

out_varid = netcdf.defVar(ncid_out,varname,xtype,dimids);
netcdf.defVarDeflate(ncid_out,out_varid, true, true, 7);

有关详细信息,请参阅: https://www.mathworks.com/help/matlab/ref/netcdf.defvardeflate.html?requestedDomain=www.mathworks.com