使 pydata 处理字符串列

to make pydata handle string columns

我有一个数据框,其中有几列是浮点数,几列是字符串。所有列都有 nan。字符串列包含字符串或 nan,它们似乎具有类型 float。当我尝试 'df.to_hdf' 存储数据帧时,出现以下错误:

your performance may suffer as PyTables will pickle object types that it cannot map directly to c-types [inferred_type->mixed,key->block0_values] [items->['operation', 'snl_datasource_period', 'ticker', 'cusip', 'end_fisca_perio_date', 'fiscal_period', 'finan_repor_curre_code', 'earni_relea_date', 'finan_perio_begin_on']]

我该如何解决?

您可以用适当的缺失值填充每一列。例如

import pandas as pd
import numpy as np

col1 = [1.0, np.nan, 3.0]
col2 = ['one', np.nan, 'three']

df = pd.DataFrame(dict(col1=col1, col2=col2))
df['col1'] = df['col1'].fillna(0.0)
df['col2'] = df['col2'].fillna('')

df.to_hdf('eg.hdf', 'eg')