ValueError: Unknown label type: 'continuous' when using clustering + classification models together
ValueError: Unknown label type: 'continuous' when using clustering + classification models together
我创建了一个聚类模型,尝试使用 Scikit-Learn 的 KMeans 算法根据年收入和支出得分找到不同的客户群。使用它为每个客户返回的聚类值,我尝试使用来自 sklearn.svm 的支持向量分类创建分类模型。然而,当我尝试将新模型拟合到数据集时,我收到一条错误消息:
File "/Users/user/Documents/Machine Learning A-Z Template Folder/Part 4 - Clustering/Section 24 - K-Means Clustering/cluster_and_prediction.py", line 28, in <module>
classifier.fit(x_train, y_train)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/svm/_base.py", line 149, in fit
y = self._validate_targets(y)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/svm/_base.py", line 525, in _validate_targets
check_classification_targets(y)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/multiclass.py", line 169, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'
我的代码如下
import pandas as pd
import numpy as np
# Using relevant columns from dataset
dataset = pd.read_csv('Mall_Customers.csv')
x = dataset.iloc[:, 3:5].values
# Creating model with ideal amount of clusters
kmeans = KMeans(n_clusters=5, init='k-means++', max_iter=300, n_init=10, random_state=0)
kmeans.fit(x)
predictions = kmeans.predict(x)
# Creating numpy array for feature scaling
predictions = np.array(predictions, dtype=int)
predictions = predictions[:, None]
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
sc_y = StandardScaler()
x = sc_x.fit_transform(x)
predictions = sc_y.fit_transform(predictions)
# Splitting dataset into training and test sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, predictions, test_size=.25)
# Creating Support Vector Classification model
from sklearn.svm import SVC
classifier = SVC(kernel='rbf')
classifier.fit(x_train, y_train)
Elbow Model Used for Clustering
Clustering Visualization
.zip file with the dataset(the dataset is called 'Mall_Customers.csv'
我该如何解决这个问题?
由于您想将此作为 5 类 的分类问题来解决,因此您应该不对标签使用缩放器;这会将它们转换为分类模型中的连续变量,因此会出现错误。
此外,与问题无关,但正确的方法是仅将你的缩放器拟合到你的训练数据上,然后使用这个拟合缩放器来转换你的测试数据。
因此,这里是必要的更改(在您完成 predictions
变量的设置之后):
# initial (unscaled) x used here:
x_train, x_test, y_train, y_test = train_test_split(x, predictions, test_size=.25)
sc = StandardScaler()
x_train_scaled = sc.fit_transform(x_train)
x_test_scaled = sc.transform(x_test)
classifier = SVC(kernel='rbf')
classifier.fit(x_train_scaled, y_train) # no scaling for predictions or y_train
也与问题无关,但您应该在使用 k-means 之前 缩放您的 x
数据,即您实际上应该首先缩放您的 x
然后执行聚类(将其留作练习,因为它与错误无关)。
我创建了一个聚类模型,尝试使用 Scikit-Learn 的 KMeans 算法根据年收入和支出得分找到不同的客户群。使用它为每个客户返回的聚类值,我尝试使用来自 sklearn.svm 的支持向量分类创建分类模型。然而,当我尝试将新模型拟合到数据集时,我收到一条错误消息:
File "/Users/user/Documents/Machine Learning A-Z Template Folder/Part 4 - Clustering/Section 24 - K-Means Clustering/cluster_and_prediction.py", line 28, in <module>
classifier.fit(x_train, y_train)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/svm/_base.py", line 149, in fit
y = self._validate_targets(y)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/svm/_base.py", line 525, in _validate_targets
check_classification_targets(y)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/sklearn/utils/multiclass.py", line 169, in check_classification_targets
raise ValueError("Unknown label type: %r" % y_type)
ValueError: Unknown label type: 'continuous'
我的代码如下
import pandas as pd
import numpy as np
# Using relevant columns from dataset
dataset = pd.read_csv('Mall_Customers.csv')
x = dataset.iloc[:, 3:5].values
# Creating model with ideal amount of clusters
kmeans = KMeans(n_clusters=5, init='k-means++', max_iter=300, n_init=10, random_state=0)
kmeans.fit(x)
predictions = kmeans.predict(x)
# Creating numpy array for feature scaling
predictions = np.array(predictions, dtype=int)
predictions = predictions[:, None]
from sklearn.preprocessing import StandardScaler
sc_x = StandardScaler()
sc_y = StandardScaler()
x = sc_x.fit_transform(x)
predictions = sc_y.fit_transform(predictions)
# Splitting dataset into training and test sets
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x, predictions, test_size=.25)
# Creating Support Vector Classification model
from sklearn.svm import SVC
classifier = SVC(kernel='rbf')
classifier.fit(x_train, y_train)
Elbow Model Used for Clustering
Clustering Visualization
.zip file with the dataset(the dataset is called 'Mall_Customers.csv'
我该如何解决这个问题?
由于您想将此作为 5 类 的分类问题来解决,因此您应该不对标签使用缩放器;这会将它们转换为分类模型中的连续变量,因此会出现错误。
此外,与问题无关,但正确的方法是仅将你的缩放器拟合到你的训练数据上,然后使用这个拟合缩放器来转换你的测试数据。
因此,这里是必要的更改(在您完成 predictions
变量的设置之后):
# initial (unscaled) x used here:
x_train, x_test, y_train, y_test = train_test_split(x, predictions, test_size=.25)
sc = StandardScaler()
x_train_scaled = sc.fit_transform(x_train)
x_test_scaled = sc.transform(x_test)
classifier = SVC(kernel='rbf')
classifier.fit(x_train_scaled, y_train) # no scaling for predictions or y_train
也与问题无关,但您应该在使用 k-means 之前 缩放您的 x
数据,即您实际上应该首先缩放您的 x
然后执行聚类(将其留作练习,因为它与错误无关)。