DataArray在简单计算中删除Attributes

DataArray deletes Attributes in simple computation

我注意到,如果您有一个 xArray DatarArray 并对其执行简单(!)计算,属性会得到 'deleted'。

示例:

example            = xr.DataArray(np.array([1,2,3]), attrs={'one':1})
without_Attributes = example*3

另一方面,如果您使用 numpy 特定函数(例如 .round(x), ..),属性将保留。对此有合理的解释吗?有没有办法在不丢失其属性的情况下乘以 DataArray?

来自 "what is your approach to metadata?" 上的 xarray 文档:

We are firm believers in the power of labeled data! In addition to dimensions and coordinates, xarray supports arbitrary metadata in the form of global (Dataset) and variable specific (DataArray) attributes (attrs).

Automatic interpretation of labels is powerful but also reduces flexibility. With xarray, we draw a firm line between labels that the library understands (dims and coords) and labels for users and user code (attrs). For example, we do not automatically interpret and enforce units or CF conventions. (An exception is serialization to and from netCDF files.)

An implication of this choice is that we do not propagate attrs through most operations unless explicitly flagged (some methods have a keep_attrs option, and there is a global flag for setting this to be always True or False). Similarly, xarray does not check for conflicts between attrs when combining arrays and datasets, unless explicitly requested with the option compat='identical'. The guiding principle is that metadata should not be allowed to get in the way.

您可以使用 xr.set_options:

在 xarray 中设置全局选项
In [14]: xr.set_options(keep_attrs=True)
Out[14]: <xarray.core.options.set_options at 0x133ef58e0>

现在,属性被保留

In [15]: example * 3
Out[15]:
<xarray.DataArray (dim_0: 3)>
array([3, 6, 9])
Dimensions without coordinates: dim_0
Attributes:
    one:      1

请注意,xarray 不会对这些属性执行任何“智能”操作,这就是默认行为是在计算中删除它们的原因。例如,一个带有单位的简单示例显示设置 keep_attrs=True 如何脱离 rails:

In [17]: dist = xr.DataArray(np.array([1,2,3]), attrs={'units': 'm'})
    ...: dist
Out[17]:
<xarray.DataArray (dim_0: 3)>
array([1, 2, 3])
Dimensions without coordinates: dim_0
Attributes:
    units:    m

In [18]: rate = xr.DataArray(np.array([2, 2, 2]), attrs={'units': 'm/s'})
    ...: rate
Out[18]:
<xarray.DataArray (dim_0: 3)>
array([2, 2, 2])
Dimensions without coordinates: dim_0
Attributes:
    units:    m/s

In [19]: dist / rate
Out[19]:
<xarray.DataArray (dim_0: 3)>
array([0.5, 1. , 1.5])
Dimensions without coordinates: dim_0
Attributes:
    units:    m

如果您想在使用 xarray 的计算中显式处理单位,请查看 pint-xarray, which is an effort to integrate the pint project 的使用 xarray 的显式单位处理。这个项目是实验性的,API 不稳定,但最近 pint-xarray 团队和 xarray 的核心团队都在朝着同一个方向前进,所以我不认为这种协调会消失.

解决方法(或者可能是世界上最好的方法?)

请注意,由于 DatasetDataArray 属性只是字典,因此保存它们很容易:

In [22]: result = example * 3
    ...: result.attrs.update(example.attrs)

In [23]: result
Out[23]:
<xarray.DataArray (dim_0: 3)>
array([3, 6, 9])
Dimensions without coordinates: dim_0
Attributes:
    one:      1

您甚至可以独立于 DataArray 或 Dataset 使用它们:


In [25]: ds = xr.open_dataset('my_well_documented_file.nc')

In [26]: source_attrs = ds.attrs

In [23]: result = xr.Dataset({'new_var': ds.varname * 3})

In [24]: result.attrs.update(
    ...:     # custom new attrs
    ...:     method='multiplied varname by 3',
    ...:     updated=pd.Timestamp.now(tz='US/Pacific').strftime('%c'),
    ...:     # carry forward attrs from input file
    ...:     **{source_attrs[k] for k in ['author', 'contact']},
    ...: )

所以我通常采用的方法是在计算结束时显式复制我想要的属性。而且,如果需要,您可以使用 xarray-pint 显式处理单元,然后将其他元数据作为字典进行处理。