如何从 k-最近邻预测中提取边界值
How to extract the boundary values from k-nearest neighbors predict
- 如何仅从
.predict
中 提取 或返回 sklearn.neighbors.KNeighborsClassifier()
的边界值?
MRE
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
# prepare data
iris = load_iris()
X = iris.data
y = iris.target
df = pd.DataFrame(X, columns=iris.feature_names)
df['label'] = y
species_map = dict(zip(range(3), iris.target_names))
df['species'] = df.label.map(species_map)
df = df.reindex(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)', 'species', 'label'], axis=1)
# instantiate model
knn = KNeighborsClassifier(n_neighbors=6)
# predict for 'petal length (cm)' and 'petal width (cm)'
knn.fit(df.iloc[:, 2:4], df.label)
h = .02 # step size in the mesh
# create colormap for the contour plot
cmap_light = ListedColormap(list(sns.color_palette('pastel', n_colors=3)))
# Plot the decision boundary.
# For that, we will assign a color to each point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = df['petal length (cm)'].min() - 1, df['petal length (cm)'].max() + 1
y_min, y_max = df['petal width (cm)'].min() - 1, df['petal width (cm)'].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = knn.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
# create plot
fig, ax = plt.subplots()
# add data points
sns.scatterplot(data=df, x='petal length (cm)', y='petal width (cm)', hue='species', ax=ax, edgecolor='k')
# add decision boundary countour map
ax.contourf(xx, yy, Z, cmap=cmap_light, alpha=0.4)
# legend
lgd = plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
结果图
想要的剧情
- 不是颜色或样式,只是它只有决策边界和数据点。
资源
scikit-learn
: Nearest Neighbors Classification
scikit-learn
: Plot the decision boundaries of a VotingClassifier
scikit-learn
: Comparing Nearest Neighbors with and without Neighborhood Components Analysis
SO 没有回答问题的问题
- Plotting a decision boundary separating 2 classes using Matplotlib's pyplot
- 此 solution 展示了如何在不填充绘图的情况下绘制决策边界,但是 none 的答案展示了如何 提取决策边界值 .
plt.contour(xx, yy, Z, cmap=plt.cm.Paired)
自我回答
- 我提供了一个解决方案,但我不确定它是否是最好的解决方案。我当然愿意接受其他选择。
- 就是说,我不想要在
contourf
或 pcolormesh
图中着色的解决方案。
- 简而言之,最好的解决方案是仅提取决策边界值。
- 这是我想出的一个解决方案,它沿
Z
的两个轴使用 np.diff
,即 .predict
结果。这个想法是,每当结果发生变化时,这就是决策边界。
- 使用
.diff
从其自身减去 Z
,平移 1。
- 创建
mask
,使用 np.diff(Z) != 0
- 使用
mask
到 select 来自 xx
和 yy
的适当 x
和 y
- 使用 OP 中的现有代码
# use diff to create a mask
mask = np.diff(Z, axis=1) != 0
mask2 = np.diff(Z, axis=0) != 0
# apply mask against xx and yy
xd = np.concatenate((xx[:, 1:][mask], xx[1:, :][mask2]))
yd = np.concatenate((yy[:, 1:][mask], yy[1:, :][mask2]))
# plot just the decision boundary
fig, ax = plt.subplots()
sns.scatterplot(x=xd, y=yd, color='k', edgecolor='k', s=5, ax=ax, label='decision boundary')
plt.show()
fig, ax = plt.subplots()
sns.scatterplot(data=df, x='petal length (cm)', y='petal width (cm)', hue='species', ax=ax, edgecolor='k')
sns.scatterplot(x=xd, y=yd, color='k', edgecolor='k', s=5, ax=ax, label='decision boundary')
lgd = plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
xd
和 yd
正确覆盖 plt.contourf
- 如何仅从
.predict
中 提取 或返回sklearn.neighbors.KNeighborsClassifier()
的边界值?
MRE
import pandas as pd
import numpy as np
from sklearn.datasets import load_iris
from sklearn.neighbors import KNeighborsClassifier
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
# prepare data
iris = load_iris()
X = iris.data
y = iris.target
df = pd.DataFrame(X, columns=iris.feature_names)
df['label'] = y
species_map = dict(zip(range(3), iris.target_names))
df['species'] = df.label.map(species_map)
df = df.reindex(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)', 'species', 'label'], axis=1)
# instantiate model
knn = KNeighborsClassifier(n_neighbors=6)
# predict for 'petal length (cm)' and 'petal width (cm)'
knn.fit(df.iloc[:, 2:4], df.label)
h = .02 # step size in the mesh
# create colormap for the contour plot
cmap_light = ListedColormap(list(sns.color_palette('pastel', n_colors=3)))
# Plot the decision boundary.
# For that, we will assign a color to each point in the mesh [x_min, x_max]x[y_min, y_max].
x_min, x_max = df['petal length (cm)'].min() - 1, df['petal length (cm)'].max() + 1
y_min, y_max = df['petal width (cm)'].min() - 1, df['petal width (cm)'].max() + 1
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
Z = knn.predict(np.c_[xx.ravel(), yy.ravel()]).reshape(xx.shape)
# create plot
fig, ax = plt.subplots()
# add data points
sns.scatterplot(data=df, x='petal length (cm)', y='petal width (cm)', hue='species', ax=ax, edgecolor='k')
# add decision boundary countour map
ax.contourf(xx, yy, Z, cmap=cmap_light, alpha=0.4)
# legend
lgd = plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()
结果图
想要的剧情
- 不是颜色或样式,只是它只有决策边界和数据点。
资源
scikit-learn
: Nearest Neighbors Classificationscikit-learn
: Plot the decision boundaries of a VotingClassifierscikit-learn
: Comparing Nearest Neighbors with and without Neighborhood Components Analysis
SO 没有回答问题的问题
- Plotting a decision boundary separating 2 classes using Matplotlib's pyplot
- 此 solution 展示了如何在不填充绘图的情况下绘制决策边界,但是 none 的答案展示了如何 提取决策边界值 .
plt.contour(xx, yy, Z, cmap=plt.cm.Paired)
- 此 solution 展示了如何在不填充绘图的情况下绘制决策边界,但是 none 的答案展示了如何 提取决策边界值 .
自我回答
- 我提供了一个解决方案,但我不确定它是否是最好的解决方案。我当然愿意接受其他选择。
- 就是说,我不想要在
contourf
或pcolormesh
图中着色的解决方案。 - 简而言之,最好的解决方案是仅提取决策边界值。
- 这是我想出的一个解决方案,它沿
Z
的两个轴使用np.diff
,即.predict
结果。这个想法是,每当结果发生变化时,这就是决策边界。- 使用
.diff
从其自身减去Z
,平移 1。 - 创建
mask
,使用np.diff(Z) != 0
- 使用
mask
到 select 来自xx
和yy
的适当
x
和y
- 使用
- 使用 OP 中的现有代码
# use diff to create a mask
mask = np.diff(Z, axis=1) != 0
mask2 = np.diff(Z, axis=0) != 0
# apply mask against xx and yy
xd = np.concatenate((xx[:, 1:][mask], xx[1:, :][mask2]))
yd = np.concatenate((yy[:, 1:][mask], yy[1:, :][mask2]))
# plot just the decision boundary
fig, ax = plt.subplots()
sns.scatterplot(x=xd, y=yd, color='k', edgecolor='k', s=5, ax=ax, label='decision boundary')
plt.show()
fig, ax = plt.subplots()
sns.scatterplot(data=df, x='petal length (cm)', y='petal width (cm)', hue='species', ax=ax, edgecolor='k')
sns.scatterplot(x=xd, y=yd, color='k', edgecolor='k', s=5, ax=ax, label='decision boundary')
lgd = plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left')