机器学习:如何在具有分类和数字特征的 pandas 数据帧上应用一种热编码?
Machine Learning: How do I apply one hot encoding on a pandas dataframe with both categorical and numerical features?
有些特征是数字特征,例如 "graduation rate from school",而其他特征是分类特征,例如学校名称。我对分类特征使用了标签编码器,将它们转换为整数。
我现在有一个包含浮点数和整数的数据框,分别表示数字特征和分类特征(使用标签编码器转换)。
我不确定如何进行学习,我需要使用一种热编码吗?如果是这样,我该怎么做?根据我目前的理解,我不能简单地将数据帧传递给 sklearn OneHotEncoder,因为有浮点数。我是否只需将标签编码器应用于所有功能即可解决问题?
Sample data from my dataframe. OPEID and opeid6 were transformed using a label encoder
非常感谢!
只需将 OneHotEncoder
categorical_features
参数用于 select ,特征是分类的:
categorical_features: “all” or array of indices or mask :
Specify what features are treated as categorical.
- ‘all’ (default): All features are treated as categorical.
- array of indices: Array of categorical feature indices.
mask: Array of length n_features and with dtype=bool.
Non-categorical features are always stacked to the right of the matrix.
有些特征是数字特征,例如 "graduation rate from school",而其他特征是分类特征,例如学校名称。我对分类特征使用了标签编码器,将它们转换为整数。
我现在有一个包含浮点数和整数的数据框,分别表示数字特征和分类特征(使用标签编码器转换)。
我不确定如何进行学习,我需要使用一种热编码吗?如果是这样,我该怎么做?根据我目前的理解,我不能简单地将数据帧传递给 sklearn OneHotEncoder,因为有浮点数。我是否只需将标签编码器应用于所有功能即可解决问题?
Sample data from my dataframe. OPEID and opeid6 were transformed using a label encoder
非常感谢!
只需将 OneHotEncoder
categorical_features
参数用于 select ,特征是分类的:
categorical_features: “all” or array of indices or mask :
Specify what features are treated as categorical.
- ‘all’ (default): All features are treated as categorical.
- array of indices: Array of categorical feature indices.
mask: Array of length n_features and with dtype=bool.
Non-categorical features are always stacked to the right of the matrix.