Python - 如何将使用 LabelEncoder 编码的数据在被 train_test_split 拆分后反转编码?
Python - How to reverse the encoding of data encoded with LabelEncoder after it has been split by train_test_split?
我正在尝试导出数据集的未编码版本,该版本使用 LabelEncoder(来自 sklearn.preprocessing
,以启用机器学习算法的应用)进行编码,随后分成训练和测试数据集(train_test_split)。
我想将测试数据导出到 excel 但使用原始值。到目前为止,我发现的示例仅对一个变量使用 LabelEncoder 的 inverse_transform
方法。我想将它自动应用于首先编码的多个列。
这是一个示例数据:
# data
code = ('A B C D A B C D E F').split()
sp = ('animal bird animal animal animal bird animal animal bird thing').split()
res = ('yes, yes, yes, yes, no, no, yes, no, yes, no').split(", ")
data =pd.DataFrame({'code':code, 'sp':sp, 'res':res})
data
假设 'res' 是目标变量,'code' & 'sp' 是特征。
给你:
# data
code = ('A B C D A B C D E F').split()
sp = ('animal bird animal animal animal bird animal animal bird thing').split()
res = ('yes, yes, yes, yes, no, no, yes, no, yes, no').split(", ")
data = pd.DataFrame({'code':code, 'sp':sp, 'res':res})
data
# creating LabelEncoder object
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
# encoding
dfe = pd.DataFrame() # created empty dataframe for saving encoded values
for column in data.columns:
dfe[column] = le.fit_transform(data[column])
dfe
# saving features
X = dfe[['code','sp']]
# saving target
y = dfe['res']
# splitting into training & test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13)
X_train
# reversal of encoding
dfr_train = X_train.copy()
for column in X.columns:
le.fit(data[column]) # you fit the column before it was encoded here
# now that python has the above encoding in its memory, we can ask it to reverse such
# encoding in the corresponding column having encoded values of the split dataset
dfr_train[column] = le.inverse_transform(X_train[column])
dfr_train
您可以对测试数据执行相同的操作。
# reversal of encoding of data
dfr_test = X_test.copy()
for column in X.columns:
le.fit(data[column])
dfr_test[column] = le.inverse_transform(X_test[column])
dfr_test
这里是导出的完整训练数据(特征+变量):
# reverse encoding of target variable y
le.fit(data['res'])
dfr_train['res'] = le.inverse_transform(y_train)
dfr_train # unencoded training data, ready for export
我正在尝试导出数据集的未编码版本,该版本使用 LabelEncoder(来自 sklearn.preprocessing
,以启用机器学习算法的应用)进行编码,随后分成训练和测试数据集(train_test_split)。
我想将测试数据导出到 excel 但使用原始值。到目前为止,我发现的示例仅对一个变量使用 LabelEncoder 的 inverse_transform
方法。我想将它自动应用于首先编码的多个列。
这是一个示例数据:
# data
code = ('A B C D A B C D E F').split()
sp = ('animal bird animal animal animal bird animal animal bird thing').split()
res = ('yes, yes, yes, yes, no, no, yes, no, yes, no').split(", ")
data =pd.DataFrame({'code':code, 'sp':sp, 'res':res})
data
假设 'res' 是目标变量,'code' & 'sp' 是特征。
给你:
# data
code = ('A B C D A B C D E F').split()
sp = ('animal bird animal animal animal bird animal animal bird thing').split()
res = ('yes, yes, yes, yes, no, no, yes, no, yes, no').split(", ")
data = pd.DataFrame({'code':code, 'sp':sp, 'res':res})
data
# creating LabelEncoder object
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
# encoding
dfe = pd.DataFrame() # created empty dataframe for saving encoded values
for column in data.columns:
dfe[column] = le.fit_transform(data[column])
dfe
# saving features
X = dfe[['code','sp']]
# saving target
y = dfe['res']
# splitting into training & test data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13)
X_train
# reversal of encoding
dfr_train = X_train.copy()
for column in X.columns:
le.fit(data[column]) # you fit the column before it was encoded here
# now that python has the above encoding in its memory, we can ask it to reverse such
# encoding in the corresponding column having encoded values of the split dataset
dfr_train[column] = le.inverse_transform(X_train[column])
dfr_train
您可以对测试数据执行相同的操作。
# reversal of encoding of data
dfr_test = X_test.copy()
for column in X.columns:
le.fit(data[column])
dfr_test[column] = le.inverse_transform(X_test[column])
dfr_test
这里是导出的完整训练数据(特征+变量):
# reverse encoding of target variable y
le.fit(data['res'])
dfr_train['res'] = le.inverse_transform(y_train)
dfr_train # unencoded training data, ready for export