使用具有不同 k 值的 k-nn 绘制图形
Draw figures using k-nn with different values of k
我想为 k-nn 分类器绘制具有不同 k 值的图形。
我的问题是这些数字似乎具有相同的 k 值。
到目前为止,我尝试过的是更改循环中每个 运行 中的 k 值:
clf = KNeighborsClassifier(n_neighbors=counter+1)
但是所有的数字似乎都是 k=1
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing()
import numpy as np
from sklearn.model_selection import train_test_split
c = np.array([1 if y > np.median(data['target']) else 0 for y in data['target']])
X_train, X_test, c_train, c_test = train_test_split(data['data'], c, random_state=0)
from sklearn.neighbors import KNeighborsClassifier
import mglearn
import matplotlib.pyplot as plt
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(20, 6))
for counter in range(3):
clf = KNeighborsClassifier(n_neighbors=counter+1)
clf.fit(X_test, c_test)
plt.tight_layout() # this will help create proper spacing between the plots.
mglearn.discrete_scatter(X_test[:,0], X_test[:,1], c_test, ax=ax[counter])
plt.legend(["Class 0", "Class 1"], loc=4)
plt.xlabel("First feature")
plt.ylabel("Second feature")
#plt.figure()
所有图看起来都一样的原因是您每次都只是简单地绘制测试集,而不是绘制测试集上的模型预测。您可能打算对 k
的每个值执行以下操作:
将模型拟合到训练集,在这种情况下,您应该将 clf.fit(X_test, c_test)
替换为 clf.fit(X_train, c_train)
。
在测试集上生成模型预测,在这种情况下您应该添加 c_pred = clf.predict(X_test)
.
在测试集上绘制模型预测,在这种情况下,您应该在散点图中将 c_test
替换为 c_pred
,即使用 mglearn.discrete_scatter(X_test[:, 0], X_test[:, 1], c_pred, ax=ax[counter])
而不是mglearn.discrete_scatter(X_test[:, 0], X_test[:, 1], c_test, ax=ax[counter])
.
更新代码:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
import mglearn
import matplotlib.pyplot as plt
data = fetch_california_housing()
c = np.array([1 if y > np.median(data['target']) else 0 for y in data['target']])
X_train, X_test, c_train, c_test = train_test_split(data['data'], c, random_state=0)
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(20, 6))
for counter in range(3):
clf = KNeighborsClassifier(n_neighbors=counter+1)
# fit the model to the training set
clf.fit(X_train, c_train)
# extract the model predictions on the test set
c_pred = clf.predict(X_test)
# plot the model predictions
plt.tight_layout()
mglearn.discrete_scatter(X_test[:,0], X_test[:,1], c_pred, ax=ax[counter])
plt.legend(["Class 0", "Class 1"], loc=4)
plt.xlabel("First feature")
plt.ylabel("Second feature")
我想为 k-nn 分类器绘制具有不同 k 值的图形。 我的问题是这些数字似乎具有相同的 k 值。 到目前为止,我尝试过的是更改循环中每个 运行 中的 k 值:
clf = KNeighborsClassifier(n_neighbors=counter+1)
但是所有的数字似乎都是 k=1
from sklearn.datasets import fetch_california_housing
data = fetch_california_housing()
import numpy as np
from sklearn.model_selection import train_test_split
c = np.array([1 if y > np.median(data['target']) else 0 for y in data['target']])
X_train, X_test, c_train, c_test = train_test_split(data['data'], c, random_state=0)
from sklearn.neighbors import KNeighborsClassifier
import mglearn
import matplotlib.pyplot as plt
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(20, 6))
for counter in range(3):
clf = KNeighborsClassifier(n_neighbors=counter+1)
clf.fit(X_test, c_test)
plt.tight_layout() # this will help create proper spacing between the plots.
mglearn.discrete_scatter(X_test[:,0], X_test[:,1], c_test, ax=ax[counter])
plt.legend(["Class 0", "Class 1"], loc=4)
plt.xlabel("First feature")
plt.ylabel("Second feature")
#plt.figure()
所有图看起来都一样的原因是您每次都只是简单地绘制测试集,而不是绘制测试集上的模型预测。您可能打算对 k
的每个值执行以下操作:
将模型拟合到训练集,在这种情况下,您应该将
clf.fit(X_test, c_test)
替换为clf.fit(X_train, c_train)
。在测试集上生成模型预测,在这种情况下您应该添加
c_pred = clf.predict(X_test)
.在测试集上绘制模型预测,在这种情况下,您应该在散点图中将
c_test
替换为c_pred
,即使用mglearn.discrete_scatter(X_test[:, 0], X_test[:, 1], c_pred, ax=ax[counter])
而不是mglearn.discrete_scatter(X_test[:, 0], X_test[:, 1], c_test, ax=ax[counter])
.
更新代码:
from sklearn.datasets import fetch_california_housing
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
import mglearn
import matplotlib.pyplot as plt
data = fetch_california_housing()
c = np.array([1 if y > np.median(data['target']) else 0 for y in data['target']])
X_train, X_test, c_train, c_test = train_test_split(data['data'], c, random_state=0)
fig, ax = plt.subplots(nrows=1, ncols=3, figsize=(20, 6))
for counter in range(3):
clf = KNeighborsClassifier(n_neighbors=counter+1)
# fit the model to the training set
clf.fit(X_train, c_train)
# extract the model predictions on the test set
c_pred = clf.predict(X_test)
# plot the model predictions
plt.tight_layout()
mglearn.discrete_scatter(X_test[:,0], X_test[:,1], c_pred, ax=ax[counter])
plt.legend(["Class 0", "Class 1"], loc=4)
plt.xlabel("First feature")
plt.ylabel("Second feature")