OneHotEncoding : TypeError: cannot perform reduce with flexible type
OneHotEncoding : TypeError: cannot perform reduce with flexible type
我试图在 X_train 上安装 OneHotEncoder,然后在 X_train、X_test 上进行转换
然而,这导致了错误:
# One hot encoding
from sklearn.preprocessing import OneHotEncoder
encode_columns = ['borough','building_class_category', 'commercial_units','residential_units']
enc = OneHotEncoder(handle_unknown='ignore')
enc.fit(X_train[encode_columns])
X_train = enc.transform(X_train[encode_columns])
X_test = enc.transform(X_test[encode_columns])
X_train.head()
错误:
4
5 enc = OneHotEncoder(handle_unknown='ignore')
----> 6 enc.fit(X_train[encode_columns])
7 X_train = enc.transform(X_train[encode_columns])
8 X_test = enc.transform(X_test[encode_columns])
TypeError: cannot perform reduce with flexible type
X_train 的示例行:
TLDR:您 可能 运行 多次拟合和变换的单元格,并且 .transform()
不起作用,您认为它起作用.
为什么会出现此错误?
如果您在一个单元格中有数据定义:
X_train = pd.DataFrame({'borough': ["Queens", "Brooklyn", "Queens", "Queens", "Brooklyn"],
'building_class_category': ["01", "02", "02", "01", "13"],
'commercial_units': ["O", "O", "O", "O", "A"],
'residential_units': [1,2,2,1,1]})
并在第二个中安装一个热编码器:
encode_columns = ['borough','building_class_category', 'commercial_units','residential_units']
enc = OneHotEncoder(handle_unknown='ignore')
enc.fit(X_train[encode_columns])
X_train = enc.transform(X_train[encode_columns])
上面的单元格第一次可以工作,但是如果你第二次 运行 单元格,你会覆盖 X_train
:
TypeError: cannot perform reduce with flexible type
所以答案的第一部分将是 - 输入和输出的名称不同。
OneHotEncoder transform
return 是什么?
如果你打印出来 enc.transform(X_train[encode_columns])
你会得到:
<5x9 sparse matrix of type '<class 'numpy.float64'>'
with 20 stored elements in Compressed Sparse Row format>
默认情况下,OneHotEncoder transform
不是 return pandas DataFrame(甚至是 numpy 数组),而是 sparse matrix。要获得一个 numpy 数组,你必须转换它:
enc.transform(X_train[encode_columns]).toarray()
或在 OneHotEncoder 的定义中设置 sparse=False
:
enc = OneHotEncoder(handle_unknown='ignore', sparse=False)
奖金:如何获得特征的描述性名称?
设置sparse=False
后,enc.transform(X_train[encode_columns])
会return numpy数组。即使您将其转换为 pd.DataFrame,列名也不会告诉您太多信息:
pd.DataFrame(enc.transform(X_train[encode_columns]))
# 0 1 2 3 4 5 6 7 8
#0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0
#1 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0
#2 0.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0
#3 0.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0
#4 1.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0
要获得正确的列名,您必须使用 get_feature_names_out()
方法:
pd.DataFrame(enc.transform(X_train[encode_columns]), columns = enc.get_feature_names_out())
# borough_Brooklyn borough_Queens ... residential_units_2
#0 0.0 1.0 ... 0.0
#1 1.0 0.0 ... 1.0
#2 0.0 1.0 ... 1.0
#3 0.0 1.0 ... 0.0
#4 1.0 0.0 ... 0.0
完整代码:
X_train = pd.DataFrame({'borough': ["Queens", "Brooklyn", "Queens", "Queens", "Brooklyn"],
'building_class_category': ["01", "02", "02", "01", "13"],
'commercial_units': ["O", "O", "O", "O", "A"],
'residential_units': [1,2,2,1,1]})
encode_columns = ['borough','building_class_category', 'commercial_units','residential_units']
enc = OneHotEncoder(handle_unknown='ignore', sparse=False)
enc.fit(X_train[encode_columns])
X_train_encoded = pd.DataFrame(enc.transform(X_train[encode_columns]), columns = enc.get_feature_names_out())
我试图在 X_train 上安装 OneHotEncoder,然后在 X_train、X_test 上进行转换 然而,这导致了错误:
# One hot encoding
from sklearn.preprocessing import OneHotEncoder
encode_columns = ['borough','building_class_category', 'commercial_units','residential_units']
enc = OneHotEncoder(handle_unknown='ignore')
enc.fit(X_train[encode_columns])
X_train = enc.transform(X_train[encode_columns])
X_test = enc.transform(X_test[encode_columns])
X_train.head()
错误:
4
5 enc = OneHotEncoder(handle_unknown='ignore')
----> 6 enc.fit(X_train[encode_columns])
7 X_train = enc.transform(X_train[encode_columns])
8 X_test = enc.transform(X_test[encode_columns])
TypeError: cannot perform reduce with flexible type
X_train 的示例行:
TLDR:您 可能 运行 多次拟合和变换的单元格,并且 .transform()
不起作用,您认为它起作用.
为什么会出现此错误?
如果您在一个单元格中有数据定义:
X_train = pd.DataFrame({'borough': ["Queens", "Brooklyn", "Queens", "Queens", "Brooklyn"],
'building_class_category': ["01", "02", "02", "01", "13"],
'commercial_units': ["O", "O", "O", "O", "A"],
'residential_units': [1,2,2,1,1]})
并在第二个中安装一个热编码器:
encode_columns = ['borough','building_class_category', 'commercial_units','residential_units']
enc = OneHotEncoder(handle_unknown='ignore')
enc.fit(X_train[encode_columns])
X_train = enc.transform(X_train[encode_columns])
上面的单元格第一次可以工作,但是如果你第二次 运行 单元格,你会覆盖 X_train
:
TypeError: cannot perform reduce with flexible type
所以答案的第一部分将是 - 输入和输出的名称不同。
OneHotEncoder transform
return 是什么?
如果你打印出来 enc.transform(X_train[encode_columns])
你会得到:
<5x9 sparse matrix of type '<class 'numpy.float64'>'
with 20 stored elements in Compressed Sparse Row format>
默认情况下,OneHotEncoder transform
不是 return pandas DataFrame(甚至是 numpy 数组),而是 sparse matrix。要获得一个 numpy 数组,你必须转换它:
enc.transform(X_train[encode_columns]).toarray()
或在 OneHotEncoder 的定义中设置 sparse=False
:
enc = OneHotEncoder(handle_unknown='ignore', sparse=False)
奖金:如何获得特征的描述性名称?
设置sparse=False
后,enc.transform(X_train[encode_columns])
会return numpy数组。即使您将其转换为 pd.DataFrame,列名也不会告诉您太多信息:
pd.DataFrame(enc.transform(X_train[encode_columns]))
# 0 1 2 3 4 5 6 7 8
#0 0.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0
#1 1.0 0.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0
#2 0.0 1.0 0.0 1.0 0.0 0.0 1.0 0.0 1.0
#3 0.0 1.0 1.0 0.0 0.0 0.0 1.0 1.0 0.0
#4 1.0 0.0 0.0 0.0 1.0 1.0 0.0 1.0 0.0
要获得正确的列名,您必须使用 get_feature_names_out()
方法:
pd.DataFrame(enc.transform(X_train[encode_columns]), columns = enc.get_feature_names_out())
# borough_Brooklyn borough_Queens ... residential_units_2
#0 0.0 1.0 ... 0.0
#1 1.0 0.0 ... 1.0
#2 0.0 1.0 ... 1.0
#3 0.0 1.0 ... 0.0
#4 1.0 0.0 ... 0.0
完整代码:
X_train = pd.DataFrame({'borough': ["Queens", "Brooklyn", "Queens", "Queens", "Brooklyn"],
'building_class_category': ["01", "02", "02", "01", "13"],
'commercial_units': ["O", "O", "O", "O", "A"],
'residential_units': [1,2,2,1,1]})
encode_columns = ['borough','building_class_category', 'commercial_units','residential_units']
enc = OneHotEncoder(handle_unknown='ignore', sparse=False)
enc.fit(X_train[encode_columns])
X_train_encoded = pd.DataFrame(enc.transform(X_train[encode_columns]), columns = enc.get_feature_names_out())