使用 Pandas Python Dataframe 的列从类型对象到 int / float 的转换

Question

场景

我有 2 个 CSV 文件 (1) u.Data 和 (2) prediction_matrix，我需要将它们读取并写入 Single Dataframe，一次完成后，根据它将包含的 int / float 值进行聚类处理

问题

我已将 2 个 CSV 组合成 1 个名为 AllData.csv 的数据框，但现在保存值的列类型具有不同的类型 (object),如下图(带警告)

sys:1: DtypeWarning: Columns (0,1,2) have mixed types. Specify dtype option on import or set low_memory=False.
UDATA -------------
uid    int64
iid    int64
rat    int64
dtype: object
PRED_MATRIX -------
uid      int64
iid      int64
rat    float64
dtype: object
AllDATA -----------
uid    object
iid    object
rat    object
dtype: object

P.S。我知道如何使用 low_memory=False 并且它只是抑制了警告。

可能的原因

with open('AllData.csv', 'w') as handle:
    udata_df.to_csv(handle, index=False)
    pred_matrix.to_csv(handle, index=False)

因为，我需要将 2 个 CSV 写入单个 DF handle 对象被使用并且可能将所有值转换为其类型。任何东西都可以保留应用相同逻辑的数据类型吗？

到目前为止无用的参考资料：

This one
This two
This too!

Answer 1

你第二个DataFrame的header也写的有问题，所以需要parametr header=False:

with open('AllData.csv', 'w') as handle:
    udata_df.to_csv(handle, index=False)
    pred_matrix.to_csv(handle, index=False, header=False)

另一种解决方案是 mode=a 追加第二个 DataFrame:

f = 'AllData.csv'
udata_df.to_csv(f, index=False)
pred_matrix.to_csv(f,header=False, index=False, mode='a')

或使用concat:

f = 'AllData.csv'
pd.concat([udata_df, pred_matrix]).to_csv(f, index=False)

示例:

udata_df = pd.DataFrame({'uid':[1,2],
                         'iid':[8,9],
                         'rat':[0,3]})

pred_matrix = udata_df * 10

第三行是header:

with open('AllData.csv', 'w') as handle:
    udata_df.to_csv(handle, index=False)
    pred_matrix.to_csv(handle, index=False)

f = 'AllData.csv'
df = pd.read_csv(f)
print (df)
   iid  rat  uid
0    8    0    1
1    9    3    2
2  iid  rat  uid
3   80    0   10
4   90   30   20

在参数 header=False 之后它工作正常：

with open('AllData.csv', 'w') as handle:
    udata_df.to_csv(handle, index=False)
    pred_matrix.to_csv(handle, index=False, header=False)

f = 'AllData.csv'
df = pd.read_csv(f)
print (df)
   iid  rat  uid
0    8    0    1
1    9    3    2
2   80    0   10
3   90   30   20

模式append解决方案：

f = 'AllData.csv'
udata_df.to_csv(f, index=False)
pred_matrix.to_csv(f,header=False, index=False, mode='a')
df = pd.read_csv(f)
print (df)
   iid  rat  uid
0    8    0    1
1    9    3    2
2   80    0   10
3   90   30   20

concat 解法：

f = 'AllData.csv'
pd.concat([udata_df, pred_matrix]).to_csv(f, index=False)
df = pd.read_csv(f)
print (df)
   iid  rat  uid
0    8    0    1
1    9    3    2
2   80    0   10
3   90   30   20

Answer 2

with open 方法在您的情况下是不必要的，因为您可以简单地连接两个矩阵，然后仅使用 pandas 将其保存到 csv，如下所示：

df = pd.concat([udata_df, pred_matrix], axis=1) df.to_csv(encoding='utf-8')

使用 Pandas Python Dataframe 的列从类型对象到 int / float 的转换

Dataframe's column conversion from type object to int / float using Pandas Python

python

csv

types

pandas

sklearn-pandas

场景

问题

可能的原因