TypeError: __init__() got an unexpected keyword argument 'categorical_features' One Hot Encoder
TypeError: __init__() got an unexpected keyword argument 'categorical_features' One Hot Encoder
我正在尝试解析上面的代码,这是我从 Kaggle 获得的,但我尝试 运行 它并抛出此错误:
return f(**kwargs)
TypeError: init() got an unexpected keyword argument 'categorical_features'
这是完整的代码:
data = pd.read_csv('auto-mpg.csv',sep = ',')
print(data.columns);
print(data.isnull().sum())
data['horsepower'] = data['horsepower'].replace('?','100')
print(data['horsepower'].value_counts())
print('O maior MPG é ',data.mpg.max(),'milhoes por galao')
print('O menor MPG é',data.mpg.min(),'milhoes por galao')
f,ax = plt.subplots(1,2,figsize=(12,6))
sns.boxplot(data.mpg,ax=ax[0])
sns.distplot(data.mpg,ax=ax[1])
print("Skewness: ",data['mpg'].skew())
print("Kurtosis: ",data['mpg'].kurtosis())
corr = data.corr()
print(corr)
x = data.iloc[:,1:].values
y = data.iloc[:,0].values
lb = LabelEncoder()
x[:,7] = lb.fit_transform(x[:,7])
onehot = OneHotEncoder(categorical_features = x)
x = onehot.fit_transform(x).toarray()
xtrain,xtest,ytrain,ytest = train_test_split(x,y,test_size = 0.2,random_state = 0)
sc = StandardScaler()
x = sc.fit_transform(x)
rfr = RandomForestRegressor(n_estimators = 200,random_state = 0)
rfr.fit(xtrain,ytrain)
ypred_rfr = rfr.predict(xtest)
print('Accuracy of the random forest model:',round(r2_score(ytest,ypred_rfr)*100,2),'%')
那么我该如何处理这个错误呢?
Deprecated since version 0.20: The categorical_features keyword was deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
有关详细信息,请参阅 Scikit-learn 0.20: sklearn.preprocessing.OneHotEncoder
而这个 展示了如何用 ColumnTransformer
重写
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([
('<Name>', OneHotEncoder(), x)], remainder="passthrough")
ct.fit_transform(x)
根据这段代码,我不太确定对所有列进行 onehot 编码是否有意义。包括数字。
假设目标是将列 car name
转换为分类和一个热编码。
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
from sklearn.ensemble import RandomForestRegressor
data = pd.read_csv('auto-mpg.csv',sep = ',')
data.columns
Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
'acceleration', 'model year', 'origin', 'car name'],
dtype='object')
正如@Jacky1205 在另一个答案中指出的那样,此功能已弃用。如果您想使用 ColumnTransformer
,最好使用 data.frames 而不是将它们保存在数组中。例如:
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([
('one hot', OneHotEncoder(), ["car name"])], remainder="passthrough")
x = ct.fit_transform(data.iloc[:,1:])
您也可以在数组级别工作,它可能会变得混乱,在这种情况下,由于您的数据不大,您可以将其保留为密集矩阵:
x = data.iloc[:,1:].values
y = data.iloc[:,0].values
lb = LabelEncoder()
x[:,7] = lb.fit_transform(x[:,7])
onehot = OneHotEncoder(sparse=False)
x = np.concatenate([x[:,:7],onehot.fit_transform(x[:,7].reshape(-1,1))],axis=1)
我正在尝试解析上面的代码,这是我从 Kaggle 获得的,但我尝试 运行 它并抛出此错误:
return f(**kwargs) TypeError: init() got an unexpected keyword argument 'categorical_features'
这是完整的代码:
data = pd.read_csv('auto-mpg.csv',sep = ',')
print(data.columns);
print(data.isnull().sum())
data['horsepower'] = data['horsepower'].replace('?','100')
print(data['horsepower'].value_counts())
print('O maior MPG é ',data.mpg.max(),'milhoes por galao')
print('O menor MPG é',data.mpg.min(),'milhoes por galao')
f,ax = plt.subplots(1,2,figsize=(12,6))
sns.boxplot(data.mpg,ax=ax[0])
sns.distplot(data.mpg,ax=ax[1])
print("Skewness: ",data['mpg'].skew())
print("Kurtosis: ",data['mpg'].kurtosis())
corr = data.corr()
print(corr)
x = data.iloc[:,1:].values
y = data.iloc[:,0].values
lb = LabelEncoder()
x[:,7] = lb.fit_transform(x[:,7])
onehot = OneHotEncoder(categorical_features = x)
x = onehot.fit_transform(x).toarray()
xtrain,xtest,ytrain,ytest = train_test_split(x,y,test_size = 0.2,random_state = 0)
sc = StandardScaler()
x = sc.fit_transform(x)
rfr = RandomForestRegressor(n_estimators = 200,random_state = 0)
rfr.fit(xtrain,ytrain)
ypred_rfr = rfr.predict(xtest)
print('Accuracy of the random forest model:',round(r2_score(ytest,ypred_rfr)*100,2),'%')
那么我该如何处理这个错误呢?
Deprecated since version 0.20: The categorical_features keyword was deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
有关详细信息,请参阅 Scikit-learn 0.20: sklearn.preprocessing.OneHotEncoder
而这个
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([
('<Name>', OneHotEncoder(), x)], remainder="passthrough")
ct.fit_transform(x)
根据这段代码,我不太确定对所有列进行 onehot 编码是否有意义。包括数字。
假设目标是将列 car name
转换为分类和一个热编码。
import pandas as pd
from scipy.sparse import csr_matrix
from sklearn.preprocessing import LabelEncoder,OneHotEncoder
from sklearn.ensemble import RandomForestRegressor
data = pd.read_csv('auto-mpg.csv',sep = ',')
data.columns
Index(['mpg', 'cylinders', 'displacement', 'horsepower', 'weight',
'acceleration', 'model year', 'origin', 'car name'],
dtype='object')
正如@Jacky1205 在另一个答案中指出的那样,此功能已弃用。如果您想使用 ColumnTransformer
,最好使用 data.frames 而不是将它们保存在数组中。例如:
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([
('one hot', OneHotEncoder(), ["car name"])], remainder="passthrough")
x = ct.fit_transform(data.iloc[:,1:])
您也可以在数组级别工作,它可能会变得混乱,在这种情况下,由于您的数据不大,您可以将其保留为密集矩阵:
x = data.iloc[:,1:].values
y = data.iloc[:,0].values
lb = LabelEncoder()
x[:,7] = lb.fit_transform(x[:,7])
onehot = OneHotEncoder(sparse=False)
x = np.concatenate([x[:,:7],onehot.fit_transform(x[:,7].reshape(-1,1))],axis=1)