为什么调用'.values'时pd.Timestamp转换为np.datetime64?

Why is pd.Timestamp converted to np.datetime64 when calling '.values'?

当访问DataFrame.values时,所有pd.Timestamp对象都被转换为np.datetime64对象,为什么?一个np.ndarray包含pd.Timestamp objects can exists,所以我不明白为什么总是会发生这种自动转换。

你知道如何预防吗?

最小示例:

import numpy as np
import pandas as pd
from datetime import datetime

# Let's declare an array with a datetime.datetime object
values = [datetime.now()]
print(type(values[0]))
> <class 'datetime.datetime'>

# Clearly, the datetime.datetime objects became pd.Timestamp once moved to a pd.DataFrame
df = pd.DataFrame(values, columns=['A'])
print(type(df.iloc[0][0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>

# Just to be sure, lets iterate over each datetime and manually convert them to pd.Timestamp
df['A'].apply(lambda x: pd.Timestamp(x))
print(type(df.iloc[0][0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>

# df.values (or series.values in this case) returns an np.ndarray
print(type(df.iloc[0].values))
> <class 'numpy.ndarray'>

# When we check what is the type of elements of the '.values' array, 
# it turns out the pd.Timestamp objects got converted to np.datetime64
print(type(df.iloc[0].values[0]))
> <class 'numpy.datetime64'>


# Just to double check, can an np.ndarray contain pd.Timestamps?
timestamp = pd.Timestamp(datetime.now())
timestamps = np.array([timestamp])
print(type(timestamps))
> <class 'numpy.ndarray'>

# Seems like it does. Why the above conversion then?
print(type(timestamps[0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>

python : 3.6.7.final.0

pandas : 0.25.3

numpy : 1.16.4

.values 背后的整个想法是:

Return a Numpy representation of the DataFrame. [docs]

我发现 pd.Timestamp 然后 'downgraded' 到 numpy 原生的 dtype 是合乎逻辑的。如果它不这样做,那么 .values 的目的是什么?

如果您确实想保留 pd.Timestamp dtype,我建议您使用原始的 Series (df.iloc[0])。我没有看到任何其他方式,因为 .values uses np.ndarray 可以根据 Github.

上的源进行转换

找到解决方法 - 使用 .array 而不是 .values (docs)

print(type(df['A'].array[0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>

这会阻止转换并使我能够访问我想使用的对象。