将数据帧保存到磁盘会丢失 numpy 数据类型
Saving dataframe to disk loses numpy datatype
我有一个大数据框,我需要将其保存到磁盘。
列的类型类似于 numpy.int32 或 numpy.floatxx
int32Data ColumName ... float32Data otherTypeData
0 150294240 4260.0 ... 3.203908e+02 7960.0
1 150294246 4260.0 ... 0.000000e+00 7960.0
2 150294252 4280.0 ... 1.117543e+03 7960.0
3 150294258 4260.0 ... 5.117185e+01 7960.0
4 150294264 4260.0 ... 5.999993e+02 7960.0
... ... ... ... ...
1839311 161375508 54592.0 ... 8.990022e+05 0.0
1839312 161375514 54624.0 ... 2.097199e+06 0.0
1839313 161375520 54656.0 ... 1.192150e+06 0.0
1839314 161375526 54688.0 ... 1.249997e+06 0.0
1839315 161375532 54592.0 ... 8.949273e+05 0.0
使用正确的数据类型可以节省大量 space 和强大的处理能力。
但是当我将数据帧 df 保存到磁盘时
np.save(FilePath,df)
重读
ReadData=np.load(FilePath).tolist()
df=DataFrame(ReadData)
然后所有数据都转换为 numpy.float64(并删除列名)
是否可以在保存和加载数据帧的同时保留每列(和列名)的数据类型?
HDF5 storage may be exactly what you are looking for, it allows you to efficiently store large amounts of data, saves data types and allows you to retrieve data very quickly. You can find more details in the documentation.
如何使用它的示例:
import pandas as pd
with pd.HDFStore(file_path) as hdf:
# to save the dataframe to the HDF
hdf.put(key, df)
# and to retrieve it later
df = hdf.get(key)
我有一个大数据框,我需要将其保存到磁盘。 列的类型类似于 numpy.int32 或 numpy.floatxx
int32Data ColumName ... float32Data otherTypeData
0 150294240 4260.0 ... 3.203908e+02 7960.0
1 150294246 4260.0 ... 0.000000e+00 7960.0
2 150294252 4280.0 ... 1.117543e+03 7960.0
3 150294258 4260.0 ... 5.117185e+01 7960.0
4 150294264 4260.0 ... 5.999993e+02 7960.0
... ... ... ... ...
1839311 161375508 54592.0 ... 8.990022e+05 0.0
1839312 161375514 54624.0 ... 2.097199e+06 0.0
1839313 161375520 54656.0 ... 1.192150e+06 0.0
1839314 161375526 54688.0 ... 1.249997e+06 0.0
1839315 161375532 54592.0 ... 8.949273e+05 0.0
使用正确的数据类型可以节省大量 space 和强大的处理能力。
但是当我将数据帧 df 保存到磁盘时
np.save(FilePath,df)
重读
ReadData=np.load(FilePath).tolist()
df=DataFrame(ReadData)
然后所有数据都转换为 numpy.float64(并删除列名)
是否可以在保存和加载数据帧的同时保留每列(和列名)的数据类型?
HDF5 storage may be exactly what you are looking for, it allows you to efficiently store large amounts of data, saves data types and allows you to retrieve data very quickly. You can find more details in the documentation.
如何使用它的示例:
import pandas as pd
with pd.HDFStore(file_path) as hdf:
# to save the dataframe to the HDF
hdf.put(key, df)
# and to retrieve it later
df = hdf.get(key)