将 DataFrame 序列化为 HDF5 存储时日期时间列出错
Error with datetime column while serialising a DataFrame into an HDF5 store
我正在尝试使用 pandas 内置函数 to_hdf 将 DataFrame 保存到 HDF5 存储中
但这引发了以下异常:
File "C:\python\lib\site-packages\pandas\io\pytables.py", line 3433, in > >create_axes
raise e
TypeError: Cannot serialize the column [date] because
its data contents are [datetime] object dtype
数据框是从一个 numpy 数组构建的,每列都有正确的类型
我尝试 convert_object() 在其他框架中阅读,但仍然失败
这是我的测试代码,我显然在数据转换中遗漏了一些东西,但无法弄清楚是什么
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
columns = ['date', 'c1', 'c2']
# building a sample test numpy array with datetime, float and integer
dtype = np.dtype("datetime64, f8, i2")
np_data = np.empty((0, len(columns)), dtype=dtype)
for i in range(1, 3):
line = [datetime(2015, 1, 1, 12, i), i/2, i*1000]
np_data = np.append(np_data, np.array([line]), axis=0)
print('##### the numpy array')
print(np_data)
# creating DataFrame from numpy array
df = pd.DataFrame(np_data, columns=columns)
# trying to force object conversion
df.convert_objects()
print('##### the DataFrame array')
print(df)
# the following fails!
try:
df.to_hdf('store.h5', 'data', append=True)
print('worked')
except Exception, e:
print('##### the error')
print(e)
上面的代码产生以下输出
##### the numpy array
[[datetime.datetime(2015, 1, 1, 12, 1) 0 1000]
[datetime.datetime(2015, 1, 1, 12, 2) 1 2000]]
##### the DataFrame array
date c1 c2
0 2015-01-01 12:01:00 0 1000
1 2015-01-01 12:02:00 1 2000
##### the error
Cannot serialize the column [date] because
its data contents are [datetime] object dtype
几乎所有pandas操作return新建对象。您的 .convert_objects()
操作丢弃了输出。
In [20]: df2 = df.convert_objects()
In [21]: df.dtypes
Out[21]:
date object
c1 object
c2 object
dtype: object
In [22]: df2.dtypes
Out[22]:
date datetime64[ns]
c1 int64
c2 int64
dtype: object
Save/restore
In [23]: df2.to_hdf('store.h5', 'data', append=True)
In [25]: pd.read_hdf('store.h5','data')
Out[25]:
date c1 c2
0 2015-01-01 12:01:00 0 1000
1 2015-01-01 12:02:00 1 2000
In [26]: pd.read_hdf('store.h5','data').dtypes
Out[26]:
date datetime64[ns]
c1 int64
c2 int64
dtype: object
最后还是比较地道,直接构造dataframe。类型是根据构造推断的。
In [32]: DataFrame({'data' : pd.date_range('20150101',periods=2,freq='s'),'c1' : [0,1], 'c2' : [1000,2000]},columns=['data','c1','c2']).dtypes
Out[32]:
data datetime64[ns]
c1 int64
c2 int64
dtype: object
我正在尝试使用 pandas 内置函数 to_hdf 将 DataFrame 保存到 HDF5 存储中 但这引发了以下异常:
File "C:\python\lib\site-packages\pandas\io\pytables.py", line 3433, in > >create_axes raise e TypeError: Cannot serialize the column [date] because its data contents are [datetime] object dtype
数据框是从一个 numpy 数组构建的,每列都有正确的类型
我尝试 convert_object() 在其他框架中阅读,但仍然失败
这是我的测试代码,我显然在数据转换中遗漏了一些东西,但无法弄清楚是什么
import numpy as np
import pandas as pd
from datetime import datetime, timedelta
columns = ['date', 'c1', 'c2']
# building a sample test numpy array with datetime, float and integer
dtype = np.dtype("datetime64, f8, i2")
np_data = np.empty((0, len(columns)), dtype=dtype)
for i in range(1, 3):
line = [datetime(2015, 1, 1, 12, i), i/2, i*1000]
np_data = np.append(np_data, np.array([line]), axis=0)
print('##### the numpy array')
print(np_data)
# creating DataFrame from numpy array
df = pd.DataFrame(np_data, columns=columns)
# trying to force object conversion
df.convert_objects()
print('##### the DataFrame array')
print(df)
# the following fails!
try:
df.to_hdf('store.h5', 'data', append=True)
print('worked')
except Exception, e:
print('##### the error')
print(e)
上面的代码产生以下输出
##### the numpy array
[[datetime.datetime(2015, 1, 1, 12, 1) 0 1000]
[datetime.datetime(2015, 1, 1, 12, 2) 1 2000]]
##### the DataFrame array
date c1 c2
0 2015-01-01 12:01:00 0 1000
1 2015-01-01 12:02:00 1 2000
##### the error
Cannot serialize the column [date] because
its data contents are [datetime] object dtype
几乎所有pandas操作return新建对象。您的 .convert_objects()
操作丢弃了输出。
In [20]: df2 = df.convert_objects()
In [21]: df.dtypes
Out[21]:
date object
c1 object
c2 object
dtype: object
In [22]: df2.dtypes
Out[22]:
date datetime64[ns]
c1 int64
c2 int64
dtype: object
Save/restore
In [23]: df2.to_hdf('store.h5', 'data', append=True)
In [25]: pd.read_hdf('store.h5','data')
Out[25]:
date c1 c2
0 2015-01-01 12:01:00 0 1000
1 2015-01-01 12:02:00 1 2000
In [26]: pd.read_hdf('store.h5','data').dtypes
Out[26]:
date datetime64[ns]
c1 int64
c2 int64
dtype: object
最后还是比较地道,直接构造dataframe。类型是根据构造推断的。
In [32]: DataFrame({'data' : pd.date_range('20150101',periods=2,freq='s'),'c1' : [0,1], 'c2' : [1000,2000]},columns=['data','c1','c2']).dtypes
Out[32]:
data datetime64[ns]
c1 int64
c2 int64
dtype: object