绘制 class 决策边界：直接确定一个 "good fit" 范围

Question

我想弄清楚如何绘制决策边界线，只选择几个中间值而不是整个分隔线，使其大致跨越 y 范围，也就是观察范围。

目前，我手动重复 select 不同的边界并进行视觉评估，直到出现“一个好看的分隔符”。

MWE：


from collections import Counter
from sklearn.datasets import make_classification
import matplotlib.pyplot as plt
import numpy as np
from sklearn.svm import SVC 

# sample data
X, y = make_classification(n_samples=100, n_features=2, n_redundant=0,
    n_clusters_per_class=1, weights=[0.9], flip_y=0, random_state=1)

# fit svm model 
svc_model = SVC(kernel='linear', random_state=32)
svc_model.fit(X, y)

# Constructing a hyperplane using a formula.
w = svc_model.coef_[0]           
b = svc_model.intercept_[0]      
x_points = np.linspace(-1, 1)   
y_points = -(w[0] / w[1]) * x_points - b / w[1]

图1：

决策边界线跨度更大，导致观察结果被“挤压”在视觉上看起来几乎像一条线的地方

plt.figure(figsize=(10, 8))

# Plotting our two-features-space
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, s=50)

# Plotting a red hyperplane
plt.plot(x_points, y_points, c='r')

图2

手动调整点以确定视觉上是否合适 (x_points[19:-29], y_points[19:-29])：

plt.figure(figsize=(10, 8))

# Plotting our two-features-space
sns.scatterplot(x=X[:, 0], y=X[:, 1], hue=y, s=50)
# Plotting a red hyperplane
plt.plot(x_points[19:-29], y_points[19:-29], c='r')

如何自动化“合适”的值范围？例如，这适用于 n_samples=100 个数据点，但不适用于 n_samples=1000.

Answer 1

您可以反转线性方程并直接在 y_points 上指定您想要的边界，而不是让 x_points 从 -1 变为 1：

y_points = np.linspace(X[:, 1].min(), X[:, 1].max())
x_points = -(w[1] * y_points + b) / w[0]

绘制 class 决策边界：直接确定一个 "good fit" 范围

Plotting class decision boundary: determine a "good fit" range directly

python

numpy

matplotlib