未知标签类型:'continuous'
Unknown label type: 'continuous'
我的队友们,
遇到问题
----------------------
Avg.SessionLength TimeonApp TimeonWebsite LengthofMembership Yearly Amount Spent
0 34.497268 12.655651 39.577668 4.082621 587.951054
1 31.926272 11.109461 37.268959 2.664034 392.204933
2 33.000915 11.330278 37.110597 4.104543 487.547505
3 34.305557 13.717514 36.721283 3.120179 581.852344
4 33.330673 12.795189 37.536653 4.446308 599.406092
5 33.871038 12.026925 34.476878 5.493507 637.102448
6 32.021596 11.366348 36.683776 4.685017 521.572175
想申请KNN
X = df[['Avg. Session Length', 'Time on App','Time on Website', 'Length of Membership']]
y = df['Yearly Amount Spent']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,
random_state=42)
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train,y_train)
ValueError:未知标签类型:'continuous'
Yearly Amount Spent
列中的值是实数,因此它们不能作为分类问题的标签(参见here):
When doing classification in scikit-learn, y is a vector of integers
or strings.
因此你得到了错误。如果要构建分类模型,则需要决定如何将它们转换为有限的标签集。
请注意,如果您只是想避免错误,您可以这样做
import numpy as np
y = np.asarray(df['Yearly Amount Spent'], dtype="|S6")
这会将 y
中的值转换为所需格式的字符串。然而,每个标签只会出现在一个样本中,因此您无法真正用这样一组标签构建有意义的模型。
我认为你实际上是在尝试进行回归而不是分类,因为你的代码看起来很像你想要预测的
每年花费的金额。在这种情况下,使用
from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor(n_neighbors=1)
相反。如果你真的有一个分类任务,比如你想分类成类 like ('yearly amount spent is low', 'yearly amount spent is high', ...),你应该将标签离散化并转换成字符串或者整数(如@Miriam Farber 所解释),根据您在这种情况下需要手动设置的阈值。
我的队友们,
遇到问题
----------------------
Avg.SessionLength TimeonApp TimeonWebsite LengthofMembership Yearly Amount Spent
0 34.497268 12.655651 39.577668 4.082621 587.951054
1 31.926272 11.109461 37.268959 2.664034 392.204933
2 33.000915 11.330278 37.110597 4.104543 487.547505
3 34.305557 13.717514 36.721283 3.120179 581.852344
4 33.330673 12.795189 37.536653 4.446308 599.406092
5 33.871038 12.026925 34.476878 5.493507 637.102448
6 32.021596 11.366348 36.683776 4.685017 521.572175
想申请KNN
X = df[['Avg. Session Length', 'Time on App','Time on Website', 'Length of Membership']]
y = df['Yearly Amount Spent']
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33,
random_state=42)
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier(n_neighbors=1)
knn.fit(X_train,y_train)
ValueError:未知标签类型:'continuous'
Yearly Amount Spent
列中的值是实数,因此它们不能作为分类问题的标签(参见here):
When doing classification in scikit-learn, y is a vector of integers or strings.
因此你得到了错误。如果要构建分类模型,则需要决定如何将它们转换为有限的标签集。
请注意,如果您只是想避免错误,您可以这样做
import numpy as np
y = np.asarray(df['Yearly Amount Spent'], dtype="|S6")
这会将 y
中的值转换为所需格式的字符串。然而,每个标签只会出现在一个样本中,因此您无法真正用这样一组标签构建有意义的模型。
我认为你实际上是在尝试进行回归而不是分类,因为你的代码看起来很像你想要预测的 每年花费的金额。在这种情况下,使用
from sklearn.neighbors import KNeighborsRegressor
knn = KNeighborsRegressor(n_neighbors=1)
相反。如果你真的有一个分类任务,比如你想分类成类 like ('yearly amount spent is low', 'yearly amount spent is high', ...),你应该将标签离散化并转换成字符串或者整数(如@Miriam Farber 所解释),根据您在这种情况下需要手动设置的阈值。