为什么调用'.values'时pd.Timestamp转换为np.datetime64?
Why is pd.Timestamp converted to np.datetime64 when calling '.values'?
当访问DataFrame.values
时,所有pd.Timestamp
对象都被转换为np.datetime64
对象,为什么?一个np.ndarray
包含pd.Timestamp
objects can exists,所以我不明白为什么总是会发生这种自动转换。
你知道如何预防吗?
最小示例:
import numpy as np
import pandas as pd
from datetime import datetime
# Let's declare an array with a datetime.datetime object
values = [datetime.now()]
print(type(values[0]))
> <class 'datetime.datetime'>
# Clearly, the datetime.datetime objects became pd.Timestamp once moved to a pd.DataFrame
df = pd.DataFrame(values, columns=['A'])
print(type(df.iloc[0][0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
# Just to be sure, lets iterate over each datetime and manually convert them to pd.Timestamp
df['A'].apply(lambda x: pd.Timestamp(x))
print(type(df.iloc[0][0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
# df.values (or series.values in this case) returns an np.ndarray
print(type(df.iloc[0].values))
> <class 'numpy.ndarray'>
# When we check what is the type of elements of the '.values' array,
# it turns out the pd.Timestamp objects got converted to np.datetime64
print(type(df.iloc[0].values[0]))
> <class 'numpy.datetime64'>
# Just to double check, can an np.ndarray contain pd.Timestamps?
timestamp = pd.Timestamp(datetime.now())
timestamps = np.array([timestamp])
print(type(timestamps))
> <class 'numpy.ndarray'>
# Seems like it does. Why the above conversion then?
print(type(timestamps[0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
python : 3.6.7.final.0
pandas : 0.25.3
numpy : 1.16.4
.values
背后的整个想法是:
Return a Numpy representation of the DataFrame. [docs]
我发现 pd.Timestamp
然后 'downgraded' 到 numpy
原生的 dtype
是合乎逻辑的。如果它不这样做,那么 .values
的目的是什么?
如果您确实想保留 pd.Timestamp
dtype
,我建议您使用原始的 Series
(df.iloc[0]
)。我没有看到任何其他方式,因为 .values
uses np.ndarray
可以根据 Github.
上的源进行转换
找到解决方法 - 使用 .array
而不是 .values
(docs)
print(type(df['A'].array[0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
这会阻止转换并使我能够访问我想使用的对象。
当访问DataFrame.values
时,所有pd.Timestamp
对象都被转换为np.datetime64
对象,为什么?一个np.ndarray
包含pd.Timestamp
objects can exists,所以我不明白为什么总是会发生这种自动转换。
你知道如何预防吗?
最小示例:
import numpy as np
import pandas as pd
from datetime import datetime
# Let's declare an array with a datetime.datetime object
values = [datetime.now()]
print(type(values[0]))
> <class 'datetime.datetime'>
# Clearly, the datetime.datetime objects became pd.Timestamp once moved to a pd.DataFrame
df = pd.DataFrame(values, columns=['A'])
print(type(df.iloc[0][0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
# Just to be sure, lets iterate over each datetime and manually convert them to pd.Timestamp
df['A'].apply(lambda x: pd.Timestamp(x))
print(type(df.iloc[0][0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
# df.values (or series.values in this case) returns an np.ndarray
print(type(df.iloc[0].values))
> <class 'numpy.ndarray'>
# When we check what is the type of elements of the '.values' array,
# it turns out the pd.Timestamp objects got converted to np.datetime64
print(type(df.iloc[0].values[0]))
> <class 'numpy.datetime64'>
# Just to double check, can an np.ndarray contain pd.Timestamps?
timestamp = pd.Timestamp(datetime.now())
timestamps = np.array([timestamp])
print(type(timestamps))
> <class 'numpy.ndarray'>
# Seems like it does. Why the above conversion then?
print(type(timestamps[0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
python : 3.6.7.final.0
pandas : 0.25.3
numpy : 1.16.4
.values
背后的整个想法是:
Return a Numpy representation of the DataFrame. [docs]
我发现 pd.Timestamp
然后 'downgraded' 到 numpy
原生的 dtype
是合乎逻辑的。如果它不这样做,那么 .values
的目的是什么?
如果您确实想保留 pd.Timestamp
dtype
,我建议您使用原始的 Series
(df.iloc[0]
)。我没有看到任何其他方式,因为 .values
uses np.ndarray
可以根据 Github.
找到解决方法 - 使用 .array
而不是 .values
(docs)
print(type(df['A'].array[0]))
> <class 'pandas._libs.tslibs.timestamps.Timestamp'>
这会阻止转换并使我能够访问我想使用的对象。