使用 pandas、Python 将 netCDF 中的时间变量更改为日期时间

Question

我有一个 netCDF 文件，其中的时间以天为单位，我猜是从 0000 年 1 月 1 日开始。但是，它们是整数，见下文，一旦加载 xarray 就无法解码时间单位。

<xarray.DataArray 'days' (time: 87600)>
array([679352., 679353., 679354., ..., 766949., 766950., 766951.])
Dimensions without coordinates: time
Attributes:
    units:      days_since_Jan11900
    long_name:  calendar_days

我想用 pandas 将它们变成日期时间。到目前为止我已经这样做了

import pandas as pd
import xarray as xr

ds = xr.open_dataset(path + '09_future_predictions/Fire weather/temp/tmax_HadGEM2-CC.nc', decode_times = False)

tmax = ds.tmax.data
time = ds.days
time = pd.to_datetime(time.data)

但是，我最终 pandas 以毫秒为单位读取整数，见下文：

DatetimeIndex(['1970-01-01 00:00:00.000679352',
               '1970-01-01 00:00:00.000679353',
               '1970-01-01 00:00:00.000679354',
               '1970-01-01 00:00:00.000679360',
               '1970-01-01 00:00:00.000679361',
               ...
               '1970-01-01 00:00:00.000766942',
        
              dtype='datetime64[ns]', length=87600, freq=None)

我该怎么做才能将其更改为第一个变量 1860-01-01？

Answer 1

不完全适用于 netCDF，但如果它能有所帮助的话。

import pandas as pd
import matplotlib.dates as md

df_1 = pd.DataFrame({'c1': [679352., 679353., 679354., 766949., 766950., 766951.]})

df_1.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 1 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   c1      6 non-null      float64
dtypes: float64(1)
memory usage: 176.0 bytes

df_1.c1.apply(md.num2date)
0   1861-01-01 00:00:00+00:00
1   1861-01-02 00:00:00+00:00
2   1861-01-03 00:00:00+00:00
3   2100-11-01 00:00:00+00:00
4   2100-11-02 00:00:00+00:00
5   2100-11-03 00:00:00+00:00
Name: c1, dtype: datetime64[ns, UTC]

Answer 2

我通过这样做自己弄明白了。请注意，我使用的是一年 365 天的日历，我在第二行代码中将 0000 年 1 月 1 日更改为 1860 年 1 月 1 日。

ds = xr.open_dataset(path + 'temp/tmax_HadGEM2-CC.nc', decode_times = False)
ds['days'] = ds.days - 679352
ds.days.attrs['units'] = 'days since 1860-01-01'
ds['days'].attrs['calendar'] = 'noleap'
ds = xr.decode_cf(ds)
ds = ds.loc[dict(time = slice(66430,73000))]

使用 pandas、Python 将 netCDF 中的时间变量更改为日期时间

Change time variable in netCDF to datetime with pandas, Python

python

netcdf

pandas

python-xarray