scikit 学习:K NeighborsClassifier - 人口矩阵与 Class 标签

sci-kit learn: KNeighborsClassifier - Population matrix vs. Class labels

我正在尝试获取与 k 个最近邻对应的 class 标签。通过 KNeighborsClassifier 的 docs,predict() 函数 returns 每个数据样本的 Class 标签 ,以及kneighbors() 函数 returns 人口矩阵中最近点的索引

这是我的代码:

from sklearn.neighbors import KNeighborsClassifier
X_train = [[1.0,2.0], [2.0, 3.0], [4.0, 5.0], [6.0, 7.0]]
y_train = ['Hello', 'this', 'is', 'test']
neigh = KNeighborsClassifier(n_neighbors=2, n_jobs=8)
neigh.fit(x_train, y_train)    
X_test = [[3.0, 3.0]]


>>> neigh.predict(xtest)
array(['Hello'], dtype='<U5')
>>> neigh.kneighbors(xtest)
(array([[1. , 2.23606798]]), array([[1, 0]]))

我想获取 k 个最近邻 的 class 个标签。文档中指定的人口矩阵与 class 标签之间的关系是什么?

问题:

What is the relationship between the population matrix and the class labels as specified in the docs?

答案是人口矩阵的元素与 class 标签之间存在一对一的对应关系。第一个标签对应于人口矩阵的第一个元素,第二个标签对应于第二个元素,依此类推。例如,您的示例中有以下关系:

[1.0, 2.0] <-> 'Hello'
[2.0, 3.0] <-> 'this'
[4.0, 5.0] <-> 'is'
[6.0, 7.0] <-> 'test'

因此,如果您想获得 k 个最近邻的 class 标签,您可以使用 kneighbors 函数。来自 documentation,函数 returns:

dist : array Array representing the lengths to points, only present if return_distance=True

ind : array Indices of the nearest points in the population matrix.

想法是使用 ind 数组来获取 class 标签,如下所示:

from sklearn.neighbors import KNeighborsClassifier

X_train = [[1.0, 2.0], [2.0, 3.0], [4.0, 5.0], [6.0, 7.0]]
y_train = ['Hello', 'this', 'is', 'test']
neigh = KNeighborsClassifier(n_neighbors=2, n_jobs=8)
neigh.fit(X_train, y_train)
X_test = [[3.0, 3.0]]

prediction = neigh.predict(X_test)
distances, indices = neigh.kneighbors(X_test)

print([y_train[i] for i in indices[0]])

输出

['this', 'Hello']

如果仔细观察,neigh.kneighbors(xtest) returns 两个值。您获得的第一个数组是到两个最近邻居的距离列表。第二个是训练数据集中邻居的索引。

neigh.kneighbors(xtest)
(array([[1. , 2.23606798]]), array([[1, 0]]))

您可以简单地从 y_train.

中获取这些索引的标签
dist,indices = neigh.kneighbors(xtest)
for item in indices[0]:
  print y_train[item]