如何使用 matplotlib 和 SVM 算法为图形（数据集）添加边界？

Question

我的代码：

import matplotlib.pyplot as plt
import pandas as pd

data = pd.read_csv('data/data.csv')
X = data[['x1','x2']]
y = data['y']

from sklearn.svm import SVC
classifier = SVC()
classifier.fit(X,y)

plt.scatter(data['x1'], data['x2'], c=y, s=50)
plt.show()

我的数据：

x1,x2,y
0.336493583877,-0.985950993354,0.0
-0.0110425297266,-0.10552856162,1.0
0.238159509297,-0.61741666482,1.0
-0.366782883496,-0.713818716912,1.0
1.22192307438,-1.03939898614,0.0

我当前的输出：

可能支持向量机不是在那里使用的最佳算法，但我希望看到为此生成的边界。怎么做？

应用完美的 Paul 答案，结果如下：

Answer 1

你的数据不是线性可分的你可以使用支持向量机算法
你的数据是二维的，这个算法可以通过使用内核函数将你的数据转换为三维的
你可以在 sklearn

中找到这个算法

Answer 2

建立在孙怡的 , you can use the example code from here 基础上。例如，您的问题 data.csv 中没有所有点，但我们可以生成一个带有决策边界的图，如下所示：

import pandas as pd
import numpy as np
from matplotlib.colors import ListedColormap
from sklearn.svm import SVC
import matplotlib.pyplot as plt

# load the data
data = pd.read_csv('data/data.csv')
X = data[['x1','x2']]
y = data['y']

# fit the classifier
classifier = SVC(kernel='rbf')
classifier.fit(X,y)

# first we determine the grid of points -- i.e. the min and max  for each of 
# the axises and then build a grid
resolution=0.02
x1_min, x1_max = X["x1"].min() - 1, X["x1"].max() + 1
x2_min, x2_max = X["x2"].min() - 1, X["x2"].max() + 1
xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
   np.arange(x2_min, x2_max, resolution))

# setup marker generator and color map
markers = ('s', 'x', 'o', '^', 'v')
colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
cmap = ListedColormap(colors[:len(np.unique(y))])

# plot the classifier decision boundaries
Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
Z = Z.reshape(xx1.shape)
plt.contourf(xx1, xx2, Z, alpha=0.4, cmap=cmap)
plt.xlim(xx1.min(), xx1.max())
plt.ylim(xx2.min(), xx2.max())

# plot the data points
for idx, cl in enumerate(np.unique(y)):
    plt.scatter(x=X["x1"][y == cl].values, 
                y=X["x2"][y == cl].values,
                alpha=0.6, 
                c=cmap(idx),
                edgecolor='black',
                marker=markers[idx], 
                label=cl)    
plt.show()

这很大程度上取自上面 link 中的示例代码。我试图只包含保持简单所需的内容。这是输出图像：

您会注意到我明确使用了 rbf 内核，因为您示例中的完整数据不是线性可分的。对于一个不错的，比我的更笼统的，在这些轮廓上回答这个 answer 很好。

如何使用 matplotlib 和 SVM 算法为图形（数据集）添加边界？

How to add a boundary to a figure (data set) using matplotlib and SVM algorithm?

python

matplotlib

pandas

scikit-learn

sklearn-pandas