使 pydata 处理字符串列

Question

我有一个数据框，其中有几列是浮点数，几列是字符串。所有列都有 nan。字符串列包含字符串或 nan，它们似乎具有类型 float。当我尝试 'df.to_hdf' 存储数据帧时，出现以下错误：

your performance may suffer as PyTables will pickle object types that it cannot map directly to c-types [inferred_type->mixed,key->block0_values] [items->['operation', 'snl_datasource_period', 'ticker', 'cusip', 'end_fisca_perio_date', 'fiscal_period', 'finan_repor_curre_code', 'earni_relea_date', 'finan_perio_begin_on']]

我该如何解决？

Answer 1

您可以用适当的缺失值填充每一列。例如

import pandas as pd
import numpy as np

col1 = [1.0, np.nan, 3.0]
col2 = ['one', np.nan, 'three']

df = pd.DataFrame(dict(col1=col1, col2=col2))
df['col1'] = df['col1'].fillna(0.0)
df['col2'] = df['col2'].fillna('')

df.to_hdf('eg.hdf', 'eg')

使 pydata 处理字符串列

to make pydata handle string columns

pytables

pandas