如何有效地选择降低到已知点平均距离的点？

Question

因此，您在 space 中获得了一组 "explored" 个点，以及一组 "unexplored" 个点。您想选择 K 个未探索点进行探索，以便从未探索点到它们最近的探索点的平均距离最小化。

这是否比通过蛮力逐个挑选未探索点并测量平均距离更有效？

我有下面的 python 函数可以完成工作。但这对于大型集合是不可行的，因为它变得非常慢。我想将其用于一组至少数十万个未探索点。所以它需要更有效。我不需要最优解，一个好的近似值就可以了！

如果没有嵌套的 for 循环，这能以某种方式完成吗？

或者能否以某种方式只选择最有可能的点进行评估？

所有想法将不胜感激！

import numpy as np

explored = np.random.rand(100,3)
unexplored = np.random.rand(100000,3)

def k_anchors(explored, unexplored, K):

    anchors = np.empty((K, unexplored.shape[1]))

    for j in range(K):
        proximity_sum = np.zeros((len(unexplored),))

        for k in range(len(unexplored)):
            temp_results = np.concatenate(( explored, unexplored[k].reshape((-1,3)) ))
            proximity = np.zeros((len( unexplored ),))

            for i in range(len( unexplored )):
                i_prox = (abs((unexplored[i,:] - temp_results))).sum(axis=1)
                proximity[i] = i_prox.min()

            proximity_sum[k] = proximity.sum()

        idx = np.argmin( proximity_sum )
        anchors[j,:] = unexplored[ idx ]
        unexplored = np.delete(unexplored, idx, 0)
        explored = np.concatenate(( explored, unexplored[ idx ] ))

    return anchors

print( k_anchors(explored, unexplored, 5) )

解决方案

问题已通过 Barış Can Tayiz 提出的 K 均值算法的变体解决，效果非常好。

简而言之，我将探索点初始化为质心，以及 K 个随机点。然后只有 K 个随机点在拟合数据时发生变化。对我来说，K这个数字不需要优化，因为我现在每次调用函数时我可以探索多少点。

感谢大家抽出宝贵的时间来讨论和回答这个问题！

Answer 1

您可以为此目的使用无监督学习算法。例如，如果您 select k = 3 for k 均值，则必须探索离中心最近的点。选择 k 是另一个问题。看这篇文章就可以达到那个https://medium.com/analytics-vidhya/how-to-determine-the-optimal-k-for-k-means-708505d204eb。您可以将第 n+1 - nth / nth - n-1th 的差值用于簇内误差平方和 (WSS)。此比率将在测量 WSS 时给出最佳 k。

如何有效地选择降低到已知点平均距离的点？

How to effectively pick points that lowers mean distance to known points?

python

algorithm

numpy

mathematical-optimization

graph-algorithm

解决方案