OneHotEncoder categorical_features 已弃用,如何转换特定列
OneHotEncoder categorical_features deprecated, how to transform specific column
我需要将独立字段从字符串转换为算术符号。我正在使用 OneHotEncoder 进行转换。我的数据集有许多独立的列,其中一些列如下:
Country | Age
--------------------------
Germany | 23
Spain | 25
Germany | 24
Italy | 30
我必须像
这样对国家/地区列进行编码
0 | 1 | 2 | 3
--------------------------------------
1 | 0 | 0 | 23
0 | 1 | 0 | 25
1 | 0 | 0 | 24
0 | 0 | 1 | 30
我通过使用OneHotEncoder as
成功实现了愿望转换
#Encoding the categorical data
from sklearn.preprocessing import LabelEncoder
labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])
#we are dummy encoding as the machine learning algorithms will be
#confused with the values like Spain > Germany > France
from sklearn.preprocessing import OneHotEncoder
onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X).toarray()
现在我收到要使用的折旧消息 categories='auto'
。如果我这样做,则正在对所有独立列(如国家/地区、年龄、薪水等)进行转换。
如何只对数据集第0列进行转换?
实际上有 2 个警告:
FutureWarning: The handling of integer data will change in version
0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the
unique values. If you want the future behaviour and silence this
warning, you can specify "categories='auto'". In case you used a
LabelEncoder before this OneHotEncoder to convert the categories to
integers, then you can now use the OneHotEncoder directly.
第二个:
The 'categorical_features' keyword is deprecated in version 0.20 and
will be removed in 0.22. You can use the ColumnTransformer instead.
"use the ColumnTransformer instead.", DeprecationWarning)
以后不要直接在OneHotEncoder中定义列,除非你想使用"categories='auto'"。第一条消息还告诉您直接使用 OneHotEncoder,而不是先使用 LabelEncoder。
最后,第二条消息告诉你使用 ColumnTransformer,它就像一个用于列转换的管道。
这是您的案例的等效代码:
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("Name_Of_Your_Step", OneHotEncoder(),[0])], remainder="passthrough")) # The last arg ([0]) is the list of columns you want to transform in this step
ct.fit_transform(X)
另请参阅:ColumnTransformer documentation
对于上面的例子;
Encoding Categorical data (Basically Changing Text to Numerical data i.e, Country Name)
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
#Encode Country Column
labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])
ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)
有一种方法可以使用 pandas 进行一次热编码。
Python:
import pandas as pd
ohe=pd.get_dummies(dataframe_name['column_name'])
为新形成的列命名,将其添加到您的数据框中。查看 pandas 文档 here。
transformer = ColumnTransformer(
transformers=[
("Country", # Just a name
OneHotEncoder(), # The transformer class
[0] # The column(s) to be applied on.
)
], remainder='passthrough'
)
X = transformer.fit_transform(X)
提醒将保留以前的数据,而将替换的第 [0] 列将被编码
我遇到了同样的问题,以下对我有用:
OneHotEncoder(categories='auto', sparse=False)
希望对您有所帮助
从 0.22 版本开始,您可以编写如下相同的代码:
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)
如您所见,您不再需要使用 LabelEncoder
。
不用labelencoder直接用OneHotEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
A = make_column_transformer(
(OneHotEncoder(categories='auto'), [0]),
remainder="passthrough")
x=A.fit_transform(x)
使用以下代码:-
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
columnTransformer = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')
X = np.array(columnTransformer.fit_transform(X), dtype = np.str)
print(X)
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
...
onehotencorder = ColumnTransformer(
[('one_hot_encoder', OneHotEncoder(), [0])],
remainder='passthrough'
)
X = onehotencorder.fit_transform(X)
# Data Preprocessing Template
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:,:-1].values
y = dataset.iloc[:,3].values
# Splitting the dataset into the Training set and Test set
#from sklearn.preprocessing import Imputer
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
#encoding Categorical Data
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])
onehotencoder = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = "passthrough")
X = onehotencoder.fit_transform(X)
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [0])],remainder='passthrough')
x = py.array(transformer.fit_transform(x), dtype=py.float)
onehotencoder = oneHotEncoder(categorical_features=[0])
此代码应该可以解决错误。
从这里更新代码时:
one_hot_encoder = OneHotEncoder(categorical_features = [0, 1, 4, 5, 6])
X_train = one_hot_encoder.fit_transform(X_train).toarray()
为此:
ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [
0, 1, 4, 5, 6])], remainder='passthrough')
X_train = np.array(ct.fit_transform(X_train), dtype=np.float)
请注意,我必须添加 dtype=np.float
来修复错误消息 TypeError: can't convert np.ndarray of type numpy.object_.
我的专栏是 [0, 1, 4, 5, 6]
和 'one_hot_encoder'
是什么。
我的导入是:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
import numpy as np
我需要将独立字段从字符串转换为算术符号。我正在使用 OneHotEncoder 进行转换。我的数据集有许多独立的列,其中一些列如下:
Country | Age
--------------------------
Germany | 23
Spain | 25
Germany | 24
Italy | 30
我必须像
这样对国家/地区列进行编码0 | 1 | 2 | 3
--------------------------------------
1 | 0 | 0 | 23
0 | 1 | 0 | 25
1 | 0 | 0 | 24
0 | 0 | 1 | 30
我通过使用OneHotEncoder as
成功实现了愿望转换#Encoding the categorical data
from sklearn.preprocessing import LabelEncoder
labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])
#we are dummy encoding as the machine learning algorithms will be
#confused with the values like Spain > Germany > France
from sklearn.preprocessing import OneHotEncoder
onehotencoder = OneHotEncoder(categorical_features=[0])
X = onehotencoder.fit_transform(X).toarray()
现在我收到要使用的折旧消息 categories='auto'
。如果我这样做,则正在对所有独立列(如国家/地区、年龄、薪水等)进行转换。
如何只对数据集第0列进行转换?
实际上有 2 个警告:
FutureWarning: The handling of integer data will change in version 0.22. Currently, the categories are determined based on the range [0, max(values)], while in the future they will be determined based on the unique values. If you want the future behaviour and silence this warning, you can specify "categories='auto'". In case you used a LabelEncoder before this OneHotEncoder to convert the categories to integers, then you can now use the OneHotEncoder directly.
第二个:
The 'categorical_features' keyword is deprecated in version 0.20 and will be removed in 0.22. You can use the ColumnTransformer instead.
"use the ColumnTransformer instead.", DeprecationWarning)
以后不要直接在OneHotEncoder中定义列,除非你想使用"categories='auto'"。第一条消息还告诉您直接使用 OneHotEncoder,而不是先使用 LabelEncoder。 最后,第二条消息告诉你使用 ColumnTransformer,它就像一个用于列转换的管道。
这是您的案例的等效代码:
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("Name_Of_Your_Step", OneHotEncoder(),[0])], remainder="passthrough")) # The last arg ([0]) is the list of columns you want to transform in this step
ct.fit_transform(X)
另请参阅:ColumnTransformer documentation
对于上面的例子;
Encoding Categorical data (Basically Changing Text to Numerical data i.e, Country Name)
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
from sklearn.compose import ColumnTransformer
#Encode Country Column
labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])
ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)
有一种方法可以使用 pandas 进行一次热编码。 Python:
import pandas as pd
ohe=pd.get_dummies(dataframe_name['column_name'])
为新形成的列命名,将其添加到您的数据框中。查看 pandas 文档 here。
transformer = ColumnTransformer(
transformers=[
("Country", # Just a name
OneHotEncoder(), # The transformer class
[0] # The column(s) to be applied on.
)
], remainder='passthrough'
)
X = transformer.fit_transform(X)
提醒将保留以前的数据,而将替换的第 [0] 列将被编码
我遇到了同样的问题,以下对我有用:
OneHotEncoder(categories='auto', sparse=False)
希望对您有所帮助
从 0.22 版本开始,您可以编写如下相同的代码:
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
ct = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = 'passthrough')
X = ct.fit_transform(X)
如您所见,您不再需要使用 LabelEncoder
。
不用labelencoder直接用OneHotEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import make_column_transformer
A = make_column_transformer(
(OneHotEncoder(categories='auto'), [0]),
remainder="passthrough")
x=A.fit_transform(x)
使用以下代码:-
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
columnTransformer = ColumnTransformer([('encoder', OneHotEncoder(), [0])], remainder='passthrough')
X = np.array(columnTransformer.fit_transform(X), dtype = np.str)
print(X)
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
...
onehotencorder = ColumnTransformer(
[('one_hot_encoder', OneHotEncoder(), [0])],
remainder='passthrough'
)
X = onehotencorder.fit_transform(X)
# Data Preprocessing Template
# Importing the libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Importing the dataset
dataset = pd.read_csv('Data.csv')
X = dataset.iloc[:,:-1].values
y = dataset.iloc[:,3].values
# Splitting the dataset into the Training set and Test set
#from sklearn.preprocessing import Imputer
from sklearn.impute import SimpleImputer
imputer = SimpleImputer(missing_values=np.nan, strategy='mean')
imputer = imputer.fit(X[:,1:3])
X[:,1:3] = imputer.transform(X[:,1:3])
#encoding Categorical Data
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
labelencoder_X = LabelEncoder()
X[:,0] = labelencoder_X.fit_transform(X[:,0])
onehotencoder = ColumnTransformer([("Country", OneHotEncoder(), [0])], remainder = "passthrough")
X = onehotencoder.fit_transform(X)
labelencoder_y = LabelEncoder()
y = labelencoder_y.fit_transform(y)
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
transformer = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [0])],remainder='passthrough')
x = py.array(transformer.fit_transform(x), dtype=py.float)
onehotencoder = oneHotEncoder(categorical_features=[0])
此代码应该可以解决错误。
从这里更新代码时:
one_hot_encoder = OneHotEncoder(categorical_features = [0, 1, 4, 5, 6])
X_train = one_hot_encoder.fit_transform(X_train).toarray()
为此:
ct = ColumnTransformer([('one_hot_encoder', OneHotEncoder(), [
0, 1, 4, 5, 6])], remainder='passthrough')
X_train = np.array(ct.fit_transform(X_train), dtype=np.float)
请注意,我必须添加 dtype=np.float
来修复错误消息 TypeError: can't convert np.ndarray of type numpy.object_.
我的专栏是 [0, 1, 4, 5, 6]
和 'one_hot_encoder'
是什么。
我的导入是:
from sklearn.compose import ColumnTransformer
from sklearn.preprocessing import OneHotEncoder
import numpy as np