将列数据类型从时间戳更改为 datetime64

Question

我有一个数据库，我正在从 excel 读取作为 pandas 数据框，日期来自时间戳 dtype，但我需要它们在 [=13] =]，这样我就可以做计算了。

我知道函数 pd.to_datetime() 和 astype(np.datetime64[ns]) 方法确实有效。但是，无论出于何种原因，我都无法使用上述代码更新我的数据框以产生此数据类型。

我还尝试从原始数据框创建一个附件数据框，其中只有我希望更新输入的日期，将其转换为 np.datetime64 并将其插回原始数据框：

dfi = df['dates']
dfi = pd.to_datetime(dfi)
df['dates'] = dfi

但是还是不行。我也尝试过一个一个地更新值：

arr_i = df.index
for i in range(len(arr_i)):
    df.at[arri[l],'dates'].to_datetime64()

编辑根本问题似乎是列的 dtype 更新为 np.datetime64，但不知何故，当从内部获取单个值时，它们仍然具有 dtype = Timestamp

有没有人建议有一个相当快的解决方法？

Answer 1

Pandas 试图通过 storing them as NumPy datetime64[ns] values when you assign them to a DataFrame. But when you try to access individual datetime64 values, they are returned as Timestamps.

标准化所有形式的日期时间

There is a way 以防止发生这种自动转换：将值列表包装在一系列 dtype object:

中

import numpy as np
import pandas as pd

# create some dates, merely for example
dates = pd.date_range('2000-1-1', periods=10)
# convert the dates to a *list* of datetime64s
arr = list(dates.to_numpy())
# wrap the values you wish to protect in a Series of dtype object.
ser = pd.Series(arr, dtype='object')

# assignment with `df['datetime64s'] = ser` would also work
df = pd.DataFrame({'timestamps': dates,
                   'datetime64s': ser})

df.info()
# <class 'pandas.core.frame.DataFrame'>
# RangeIndex: 10 entries, 0 to 9
# Data columns (total 2 columns):
# timestamps     10 non-null datetime64[ns]
# datetime64s    10 non-null object
# dtypes: datetime64[ns](1), object(1)
# memory usage: 240.0+ bytes

print(type(df['timestamps'][0]))
# <class 'pandas._libs.tslibs.timestamps.Timestamp'>

print(type(df['datetime64s'][0]))
# <class 'numpy.datetime64'>

但要小心！虽然你可以通过一些工作来绕过 Pandas' 自动转换机制，这样做可能并不明智。首先，将 NumPy 数组转换为列表通常表明您做错了什么，因为这对性能不利。使用 object 数组是一个不好的迹象，因为对对象数组的操作通常比对原生 NumPy dtypes 数组的等效操作慢得多。

您可能正在查看 XY problem -- 找到 (1) 的方法可能更有成效使用 Pandas 时间戳而不是试图强制 Pandas 到 return NumPy datetime64s 或 (2) 使用 datetime64 array-likes（例如 NumPy 数组系列）而不是单独处理值（这会导致对时间戳的强制转换）。

将列数据类型从时间戳更改为 datetime64

Changing column datatype from Timestamp to datetime64

dataframe

python-3.x

python-datetime

pandas

numpy-ndarray