为什么我的日期列在我转换为 ndarray 时会发生变化

Why does my date column change when I convert to an ndarray

下面是我的数据框

from pandas import Timestamp
df = pd.DataFrame({'Year': [Timestamp('2023-03-14 00:00:00'),Timestamp('2063-03-15 00:00:00'),Timestamp('2043-03-21 00:00:00'),Timestamp('2053-10-09 00:00:00')],
                    'offset' : [1, 9, 8, 1]
})

当我将“年份”列转换为 list() 时,它们被保存为时间戳

>>> df['Year'].to_list()
[Timestamp('2023-03-14 00:00:00'),
 Timestamp('2063-03-15 00:00:00'),
 Timestamp('2043-03-21 00:00:00'),
 Timestamp('2053-10-09 00:00:00')]

但是,当我转换为值时,它们被保存为 datetime64

>>> df['Year'].values
array(['2023-03-14T00:00:00.000000000', '2063-03-15T00:00:00.000000000',
       '2043-03-21T00:00:00.000000000', '2053-10-09T00:00:00.000000000'],
      dtype='datetime64[ns]')

如何在 Timestamp 本身中获取我的数组(而不是 datetime64 格式)?

它被转换为 datetime64 因为 numpy 数组只包含 certain datatypesTimestamp 对象不是其中之一。这与 numpy 数组如何作为一个连续的块存储在内存中,并由 numpy 的 C 后端处理有关。

v1.7 开始,添加了核心数据类型 datetime64timedelta64 以支持这些功能,但它们仍将数据作为整数存储在内存中 需要引用

您可以使用 np.array(df.Year.to_list()) 创建 Timestamp 个对象的 numpy 数组,但这将导致数组具有 dtype=object

array([Timestamp('2023-03-14 00:00:00'), Timestamp('2063-03-15 00:00:00'),
       Timestamp('2043-03-21 00:00:00'), Timestamp('2053-10-09 00:00:00')],
      dtype=object)

有关这意味着什么的更多信息,请参阅

Creating an array with dtype=object is different. The memory taken by the array now is filled with pointers to Python objects which are being stored elsewhere in memory (much like a Python list is really just a list of pointers to objects, not the objects themselves).