矢量的二维正交投影到带有 numpy 的线上会产生错误的结果

Question

我有 350 个文档分数，当我绘制它们时，它们具有以下形状：

docScores = [(0, 68.62998962), (1, 60.21374512), (2, 54.72480392), 
             (3, 50.71389389), (4, 49.39723969), ...,  
             (345, 28.3756237), (346, 28.37126923), 
             (347, 28.36397934), (348, 28.35762787), (349, 28.34219933)]

我在 pastebin 上发布了完整的数组 here（它对应于下面代码中的 dataPoints 列表）。

现在，我本来需要找到这条L-shape曲线的elbow point，多亏this post找到了。

现在，在下图中，红色矢量 p 表示肘点。我想在向量 b 上找到点 x=(?,?)（黄色星星），它对应于 p 到 b 的正交投影。

图上的红点是我得到的（明显是错误的）。我通过以下操作获得它：

b_hat = b / np.linalg.norm(b)    #unit vector of b
proj_p_onto_b = p.dot(b_hat)*b_hat
red_point = proj_p_onto_b + s

现在，如果 p 到 b 的投影由其起点和终点定义，即 s 和 x（黄色星星），它遵循 proj_p_onto_b = x - s，因此 x = proj_p_onto_b + s ?

我是不是弄错了？

编辑： 作为对@cxw 的回答，这里是计算肘点的代码：

def findElbowPoint(self, rawDocScores):
    dataPoints = zip(range(0, len(rawDocScores)), rawDocScores)
    s = np.array(dataPoints[0])
    l = np.array(dataPoints[len(dataPoints)-1])
    b_vect = l-s
    b_hat = b_vect/np.linalg.norm(b_vect)
    distances = []
    for scoreVec in dataPoints[1:]:
        p = np.array(scoreVec) - s
        proj = p.dot(b_hat)*b_hat
        d = abs(np.linalg.norm(p - proj)) # orthgonal distance between b and the L-curve
        distances.append((scoreVec[0], scoreVec[1], proj, d))

    elbow_x = max(distances, key=itemgetter(3))[0]
    elbow_y = max(distances, key=itemgetter(3))[1]
    proj = max(distances, key=itemgetter(3))[2]
    max_distance = max(distances, key=itemgetter(3))[3]

    red_point = proj + s

编辑：这是情节的代码：

>>> l_curve_x_values = [x[0] for x in docScores]
>>> l_curve_y_values = [x[1] for x in docScores]
>>> b_line_x_values = [x[0] for x in docScores]
>>> b_line_y_values = np.linspace(s[1], l[1], len(docScores))
>>> p_line_x_values = l_curve_x_values[:elbow_x]
>>> p_line_y_values = np.linspace(s[1], elbow_y, elbow_x)
>>> plt.plot(l_curve_x_values, l_curve_y_values, b_line_x_values, b_line_y_values, p_line_x_values, p_line_y_values)
>>> red_point = proj + s
>>> plt.plot(red_point[0], red_point[1], 'ro')
>>> plt.show()

Answer 1

首先，点在~(50, 37)p还是s+p？如果 p，那可能就是你的问题！如果 p 变量的 Y 分量为正，则在进行点积时不会得到预期的结果。

假设那个点是s+p，如果有点Post-它涂鸦是正确的，

p_len = np.linalg.norm(p)
p_hat = p / p_len
red_len = p_hat.dot(b_hat) * p_len   # red_len = |x-s|
    # because p_hat . b_hat = 1 * 1 * cos(angle) = |x-s| / |p|
red_point = s + red_len * b_hat

未测试！ YMMV。希望这有帮助。

Answer 2

如果您使用绘图来直观地确定解决方案看起来是否正确，则必须在每个轴上使用相同的比例绘制数据，即使用 plt.axis('equal')。如果轴的比例不相等，则图中线之间的角度会扭曲。

矢量的二维正交投影到带有 numpy 的线上会产生错误的结果

2D Orthogonal projection of vector onto line with numpy yields wrong result

python

plot

numpy

linear-algebra

orthogonal