为什么 df[0] 在串联后不返回第一列以及如何使用 ohe.categories_？

Question

dfcars= pd.read_excel('cars.xlsx')
ohe=OneHotEncoder()
temp1= pd.DataFrame(ohe.fit_transform(dfcars[['Car Model']]).toarray())
ohe.categories_
dfcars = pd.concat([dfcars,temp1], axis=1)

dfcars

dfcars.columns

尽管 dfcars 已与 temp、dfcars[0] returns 第 4 列和 dfcars[4] 连接显示错误。

dfcars[0]

为什么会这样？

此外，我到处搜索但找不到如何在 OneHotEncoder 中使用 categories_ 所以请告诉它它的作用和正确的语法使用。

我试过了，但可能是因为上述问题，起始列不见了，所有值都用 NaN 填充。

df_drop=dfcars.drop(['Car Model'],axis=1)
df_drop = pd.DataFrame(data= df_drop, columns= ohe.categories_)

df_drop

Answer 1

这可能是因为 dfcars[0] 语法是 df[column_name_string] 并且您有一个名称为“0”的列，但没有名称为“4”的列。您可以在连接之前重命名列：

temp1= pd.DataFrame(ohe.fit_transform(dfcars[['Car Model']]).toarray(),columns=['Category_0', 'Category_1', 'Category_2'])
dfcars = pd.concat([dfcars,temp1], axis=1)

对于OneHotEncoder的categories_属性，可以访问sklearn docs.

为什么 df[0] 在串联后不返回第一列以及如何使用 ohe.categories_？

why df[0] is not returning 1st column after concatenation and how to work with ohe.categories_?

python-3.x

pandas

one-hot-encoding