从多个 numpy 数组创建 xarray - 时间序列

Question

我想使用一系列用于年度时间序列数据的 numpy 数组创建具有以下坐标的 Xarray DataArray（假设它是统一 1500X1500 矩阵上的温度）。

('time', 'lon', 'lat') 坐标：

时间（时间）datetime64[ns] 2000-12-31 2001-12-31 ... 2020-12-31
lon (lon) float64 -19.98 -19.93 -19.88 -19.82 ... 54.88 54.93 54.98
纬度（纬度）float64 39.97 39.92 39.87 39.82 ... -34.88 -34.93 -34.98

我使用的原始 ('raw') 时间序列数据存储为单独的文件，文件名表示时间序列中的每一年（即数据文件本身没有提供时间信息，就在名称中，temp2000.xxx、temp2001.xxx 等）。我将这些单独的数据文件中的每一个导入到单独的 numpy 数组中，这些数组具有空间维度（对应于上面的 lat/lon）但除了我分配的变量名称之外没有时间维度。

我想弄清楚如何将所有这些 numpy 数组组合成一个多维 xarray DataArray 以及 numpy 数组的 lat/lon 和时间变量定义的时间（取自文件名） .

这可能很简单，但我无法理解它。

temp2000 = np.random.rand(1500, 1500)

xll = -20.0
xur = 55.0
yll = -35.0
yur = 40.0
cellsize = 0.1

lon_tup = np.arange(xll, xur, cellsize) + (cellsize / 2)
lat_tup = np.arange(yll, yur, cellsize)
lat_tup = lat_tup[::-1] + (cellsize / 2)
time2 = pd.date_range("2000-01-01", freq="Y", periods=21)

ds = xr.DataArray(
            coords=[time2, lat_tup, lon_tup], dims=["time", "lat", "lon"])

ds["Temperature_2000"] = (["time", "lat", "lon"], temp2000)

DataArray 创建得很好，但显然无法添加 numpy 数组，因为它缺少“时间”维度。我可以通过单独的步骤强制使用时间维度吗？示例仅针对一个时间步长 (2000)，带有用于说明目的的虚拟数据。

Answer 1

您只能使用反映数据实际形状的维度来初始化 DataArray。因此，您可以重塑您的 numpy 数组以包含一个额外的维度（例如 reshape or np.expand_dims), or create the DataArray as (lat, lon) then add the extra dimension afterwards (e.g. with da.expand_dims），如本例所示：

da = xr.DataArray(
    temp2000,
    coords=[lon_tup, lat_tup],
    dims=["lon", "lat"],
)

# expand the array to include a length-1 time dimension
# corresponding to the file's time indicator
da = da.expand_dims(time=pd.Index([2000], name="time"))

或者，您可以排除时间 dim，直到您准备好连接数据：

arrays = []

time = pd.date_range("2000-01-01", freq="Y", periods=21)
years = time.year

for y in years:
    # read in your data as (lat, lon)
    ...

    arrays.append(da)

# concat using a full TimeIndex to give the values of time as well as the name
result = xr.concat(arrays, dim=time)

还要注意 xarray 数据集（通常缩写为 ds）和 DataArrays（通常缩写为 da）之间的区别，前者本质上是 DataArray 的字典，后者是 xarray 中的基本数组单元。数据集对于与存储交互和组织工作流很有用，并且有助于跨多个数组应用相同的操作，但在大多数情况下，在进行数学运算时，您希望使用数组。有关详细信息，请参阅 data structures 上的 xarray 文档。

Answer 2

感谢 Michael Delgado 的指导。这是我的解决方案：

xll = -20.0
xur = 55.0
yll = -35.0
yur = 40.0
cellsize = 0.1

lon_tup = np.arange(xll, xur, cellsize) + (cellsize / 2)
lat_tup = np.arange(yll, yur, cellsize)
lat_tup = lat_tup[::-1] + (cellsize / 2)

StartYear = 2000
EndYear = 2020
for x in range(StartYear, EndYear):
    # filein would be the data read in from the external file
    filein = np.random.rand(1500, 1500)
    temp = np.resize(filein, (1,1500,1500))
    temp[:, 0, 0] = x
    if x == StartYear:
        array_wbm = temp
    else:
        array_wbm = np.concatenate(([array_wbm, temp]), axis=0)

time = pd.date_range("2000-01-01", freq="Y", periods=21)
years = time.year
da = xr.DataArray(data=array_wbm,
                  coords=[years, lat_tup, lon_tup],
                  dims=["year", "lat", "lon"]
                  )

从多个 numpy 数组创建 xarray - 时间序列

Creating xarray from multiple numpy arrays - time series

python

arrays

numpy

python-xarray