Pandas：删除缺少数据的行并在 UDF 中应用二进制编码

Question

正在预处理数据。我将 t 和 f 值二进制编码为 1 和 0。原来，这是我的功能：

def binary_encoding(df):
    encode = df.replace({"t":1, "f":0})
    return encode

这个returns一个浮点数。然后我将编码行更改为

encode = df.replace({"t":1, "f":0}).astype(int)

但是我得到一个错误

ValueError: Cannot convert non-finite values (NA or inf) to integer

4 列中的我是二进制编码，其中 3 列有 55 /18500 个缺失条目并且数据类型为 float64。另一列已成功编码并被识别为 int64 并按预期完全映射。

我如何编写一个函数来删除缺失的条目（它们是空白输入），然后应用最初设置的映射？

Answer 1

要解决将 `NaN` 列中的浮点值转换为整数的最终目标，您可以使用 integer with N/A support 数据类型：

假设您有如下 4 列：

3 列有 NaN 个值，一列没有。

df = pd.DataFrame({'Col1': ['f', 't', np.nan], 'Col2': [np.nan, 'f', 't'], 'Col3': ['f', np.nan, 't'], 'Col4': ['f', 't', 'f']})


  Col1 Col2 Col3 Col4
0    f  NaN    f    f
1    t    f  NaN    t
2  NaN    t    t    f

现在，在您的函数进行二进制编码后：

def binary_encoding(df):
    return df.replace({"t":1, "f":0})

new_df = binary_encoding(df)

print(new_df)


   Col1  Col2  Col3  Col4
0   0.0   NaN   0.0     0
1   1.0   0.0   NaN     1
2   NaN   1.0   1.0     0

new_df的数据类型：

new_df.dtypes

Col1    float64
Col2    float64
Col3    float64
Col4      int64
dtype: object

使用 integer with N/A support 数据类型的数据类型转换：

new_df_int = new_df.astype('Int64')


print(new_df_int)


   Col1  Col2  Col3  Col4
0     0  <NA>     0     0
1     1     0  <NA>     1
2  <NA>     1     1     0

new_df_int的数据类型：

new_df_int.dtypes

Col1    Int64
Col2    Int64
Col3    Int64
Col4    Int64
dtype: object

您现在拥有了整数数据类型，并可以随心所欲地显示为整数！您现在不再需要删除丢失的 entries/rows。

您还可以将数据类型转换应用于单个列而不是整个日期范围，例如：

new_df['Col1'] = new_df['Col1'].astype('Int64')

Pandas：删除缺少数据的行并在 UDF 中应用二进制编码

Pandas: Drop rows with missing data and apply binary encoding in UDF

python

user-defined-functions

pandas

要解决将 `NaN` 列中的浮点值转换为整数的最终目标，您可以使用 integer with N/A support 数据类型：

使用 integer with N/A support 数据类型的数据类型转换：

Pandas：删除缺少数据的行并在 UDF 中应用二进制编码

Pandas: Drop rows with missing data and apply binary encoding in UDF

python

user-defined-functions

pandas

要解决将 NaN 列中的浮点值转换为整数的最终目标，您可以使用 integer with N/A support 数据类型：

使用 integer with N/A support 数据类型的数据类型转换：

要解决将 `NaN` 列中的浮点值转换为整数的最终目标，您可以使用 integer with N/A support 数据类型：