一次热编码后 svm 分类器预测错误
Error in prediction in svm classifier after one hot encoding
在训练我的 SVM 分类器之前,我对我的数据集使用了一种热编码。
将训练集中的特征数量增加到 982。但是在
具有 7 个特征的测试数据集的预测我得到错误“X 有 7
每个样本的特征;期待982”。我不明白如何增加
测试数据集中的特征数。
My code is:
df = pd.read_csv('train.csv',header=None);
features = df.iloc[:,:-1].values
labels = df.iloc[:,-1].values
encode = LabelEncoder()
features[:,2] = encode.fit_transform(features[:,2])
features[:,3] = encode.fit_transform(features[:,3])
features[:,4] = encode.fit_transform(features[:,4])
features[:,5] = encode.fit_transform(features[:,5])
df1 = pd.DataFrame(features)
#--------------------------- ONE HOT ENCODING --------------------------------#
hotencode = OneHotEncoder(categorical_features=[2])
features = hotencode.fit_transform(features).toarray()
hotencode = OneHotEncoder(categorical_features=[14])
features = hotencode.fit_transform(features).toarray()
hotencode = OneHotEncoder(categorical_features=[37])
features = hotencode.fit_transform(features).toarray()
hotencode = OneHotEncoder(categorical_features=[466])
features = hotencode.fit_transform(features).toarray()
X = np.array(features)
y = np.array(labels)
clf = svm.LinearSVC()
clf.fit(X,y)
d_test = pd.read_csv('query.csv')
Z_test =np.array(d_test)
confidence = clf.predict(Z_test)
print("The query image belongs to Class ")
print(confidence)
######################### test dataset
query.csv
1 0.076 1 3232236298 2886732679 3128 60604
简短的回答:您需要在测试集上应用相同的 OHE 变换(或在您的情况下为 LE+OHE)。
要获得好的建议,请参阅 or
在训练我的 SVM 分类器之前,我对我的数据集使用了一种热编码。 将训练集中的特征数量增加到 982。但是在 具有 7 个特征的测试数据集的预测我得到错误“X 有 7 每个样本的特征;期待982”。我不明白如何增加 测试数据集中的特征数。
My code is:
df = pd.read_csv('train.csv',header=None);
features = df.iloc[:,:-1].values
labels = df.iloc[:,-1].values
encode = LabelEncoder()
features[:,2] = encode.fit_transform(features[:,2])
features[:,3] = encode.fit_transform(features[:,3])
features[:,4] = encode.fit_transform(features[:,4])
features[:,5] = encode.fit_transform(features[:,5])
df1 = pd.DataFrame(features)
#--------------------------- ONE HOT ENCODING --------------------------------#
hotencode = OneHotEncoder(categorical_features=[2])
features = hotencode.fit_transform(features).toarray()
hotencode = OneHotEncoder(categorical_features=[14])
features = hotencode.fit_transform(features).toarray()
hotencode = OneHotEncoder(categorical_features=[37])
features = hotencode.fit_transform(features).toarray()
hotencode = OneHotEncoder(categorical_features=[466])
features = hotencode.fit_transform(features).toarray()
X = np.array(features)
y = np.array(labels)
clf = svm.LinearSVC()
clf.fit(X,y)
d_test = pd.read_csv('query.csv')
Z_test =np.array(d_test)
confidence = clf.predict(Z_test)
print("The query image belongs to Class ")
print(confidence)
######################### test dataset
query.csv
1 0.076 1 3232236298 2886732679 3128 60604
简短的回答:您需要在测试集上应用相同的 OHE 变换(或在您的情况下为 LE+OHE)。
要获得好的建议,请参阅