Sklearn:具有字符串值和自定义度量的最近邻
Sklearn: Nearest Neightbour with String-Values and Custom Metric
我的数据如下所示(均为字符串值)
>>> all_states[0:3]
[['A','B','Empty'],
['A', 'B', 'Empty'],
['C', 'D', 'Empty']]
我想使用自定义距离度量
def mydist(x, y):
return 1
neigh = NearestNeighbors(n_neighbors=5, metric=mydist)
然而,当我打电话时
neigh.fit(np.array(all_states))
我收到错误
ValueError:无法使用 dtype='numeric'
将 bytes/strings 的数组转换为十进制数
我知道我可以使用 OneHotEncoder
或 LabelEncoder
- 但我是否也可以不对数据进行编码,因为我有自己的距离度量?
据我所知,ML 模型需要在数值数据上进行训练。如果您的距离度量具有将字符串转换为数字的方法,那么它将起作用。
metrics tr or callable, default=’minkowski’
The distance metric to usefor the tree. The default metric is minkowski, and with p=2 is
equivalent to the standard Euclidean metric. See the documentation of
DistanceMetric for a list of available metrics. If metric is
“precomputed”, X is assumed to be a distance matrix and must be square
during fit. X may be a sparse graph, in which case only “nonzero”
elements may be considered neighbors.
您可以使用 pdist documentation 并根据输入要求将其制成方形:
all_states = [['A','B','Empty'],
['A', 'B', 'Empty'],
['C', 'D', 'Empty']]
from scipy.spatial.distance import pdist,squareform
from sklearn.neighbors import NearestNeighbors
dm = squareform(pdist(all_states, mydist))
dm
array([[0., 1., 1.],
[1., 0., 1.],
[1., 1., 0.]])
neigh = NearestNeighbors(n_neighbors=5, metric="precomputed")
neigh.fit(dm)
我的数据如下所示(均为字符串值)
>>> all_states[0:3]
[['A','B','Empty'],
['A', 'B', 'Empty'],
['C', 'D', 'Empty']]
我想使用自定义距离度量
def mydist(x, y):
return 1
neigh = NearestNeighbors(n_neighbors=5, metric=mydist)
然而,当我打电话时
neigh.fit(np.array(all_states))
我收到错误
ValueError:无法使用 dtype='numeric'
将 bytes/strings 的数组转换为十进制数我知道我可以使用 OneHotEncoder
或 LabelEncoder
- 但我是否也可以不对数据进行编码,因为我有自己的距离度量?
据我所知,ML 模型需要在数值数据上进行训练。如果您的距离度量具有将字符串转换为数字的方法,那么它将起作用。
metrics tr or callable, default=’minkowski’
The distance metric to usefor the tree. The default metric is minkowski, and with p=2 is equivalent to the standard Euclidean metric. See the documentation of DistanceMetric for a list of available metrics. If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.
您可以使用 pdist documentation 并根据输入要求将其制成方形:
all_states = [['A','B','Empty'],
['A', 'B', 'Empty'],
['C', 'D', 'Empty']]
from scipy.spatial.distance import pdist,squareform
from sklearn.neighbors import NearestNeighbors
dm = squareform(pdist(all_states, mydist))
dm
array([[0., 1., 1.],
[1., 0., 1.],
[1., 1., 0.]])
neigh = NearestNeighbors(n_neighbors=5, metric="precomputed")
neigh.fit(dm)