如何解码 pandas 数据帧中的 LabelEncoder 实现列?
How to decode LabelEncoder implemented column in pandas dataframe?
我正在 dataset。我在那里通过将分类对象转换为数字来练习特征工程,代码行如下:
import pandas as pd
import numpy as np
from sklearn import preprocessing
df = pd.read_csv(r'train.csv',index_col='Id')
print(df.shape)
df.head()
colsNum = df.select_dtypes(np.number).columns
colsObj = df.columns.difference(colsNum)
df[colsNum] = df[colsNum].fillna(df[colsNum].mean()//1)
df[colsObj] = df[colsObj].fillna(df[colsObj].mode().iloc[0])
label_encoder = preprocessing.LabelEncoder()
for col in colsObj:
df[col] = label_encoder.fit_transform(df[col])
df.head()
for col in colsObj:
df[col] = label_encoder.inverse_transform(df[col])
df.head()
但是这里 inverse_tranform()
没有返回原始数据集。请帮助我!
每列需要一个编码器 - 您不能使用相同的编码器对所有列进行编码:
import pandas as pd
import numpy as np
from sklearn import preprocessing
df = pd.read_csv(r'train.csv', index_col='Id')
print(df.shape)
colsNum = df.select_dtypes(np.number).columns
colsObj = df.columns.difference(colsNum)
df[colsNum] = df[colsNum].fillna(df[colsNum].mean()//1)
df[colsObj] = df[colsObj].fillna(df[colsObj].mode().iloc[0])
print(df.head())
encoder = {}
for col in colsObj:
encoder[col] = preprocessing.LabelEncoder()
df[col] = encoder[col].fit_transform(df[col])
print(df.head())
for col in colsObj:
df[col] = encoder[col].inverse_transform(df[col])
print(df.head())
您还可以查看 this answer 了解更多详情。
我正在 dataset。我在那里通过将分类对象转换为数字来练习特征工程,代码行如下:
import pandas as pd
import numpy as np
from sklearn import preprocessing
df = pd.read_csv(r'train.csv',index_col='Id')
print(df.shape)
df.head()
colsNum = df.select_dtypes(np.number).columns
colsObj = df.columns.difference(colsNum)
df[colsNum] = df[colsNum].fillna(df[colsNum].mean()//1)
df[colsObj] = df[colsObj].fillna(df[colsObj].mode().iloc[0])
label_encoder = preprocessing.LabelEncoder()
for col in colsObj:
df[col] = label_encoder.fit_transform(df[col])
df.head()
for col in colsObj:
df[col] = label_encoder.inverse_transform(df[col])
df.head()
但是这里 inverse_tranform()
没有返回原始数据集。请帮助我!
每列需要一个编码器 - 您不能使用相同的编码器对所有列进行编码:
import pandas as pd
import numpy as np
from sklearn import preprocessing
df = pd.read_csv(r'train.csv', index_col='Id')
print(df.shape)
colsNum = df.select_dtypes(np.number).columns
colsObj = df.columns.difference(colsNum)
df[colsNum] = df[colsNum].fillna(df[colsNum].mean()//1)
df[colsObj] = df[colsObj].fillna(df[colsObj].mode().iloc[0])
print(df.head())
encoder = {}
for col in colsObj:
encoder[col] = preprocessing.LabelEncoder()
df[col] = encoder[col].fit_transform(df[col])
print(df.head())
for col in colsObj:
df[col] = encoder[col].inverse_transform(df[col])
print(df.head())
您还可以查看 this answer 了解更多详情。