为什么主成分分析给我截然不同的结果？

Question

我要解决的问题是这样的：给定图像中的一个斑点，我想得到它的方向来引导我画线来填充该区域。我希望线条沿着区域的长轴，使用尽可能少的线条。

我四处搜索，发现 PCA（主成分分析）是一种获取 blob 方向的好方法，通过向 PCA 算法提供所有点坐标：

https://alyssaq.github.io/alyssaq.github.io/2015/computing-the-axes-or-orientation-of-a-blob/index.html

但是复制确切的算法，我得到了非常令人惊讶的结果。给定形状相似的区域，PCA 算法 returns 完全不同的特征向量。它们看起来垂直：

上面的线条是按照 PCA 算法给出的斜率渲染的，有轻微的随机变化。

我不太了解数据处理，这里发生了什么？我该如何解决这个问题？

代码：

import numpy as np

# I tried passing different set of points to pca:
# 1. Only points at the perimeter of the area
# 2. Random sample of points within the area
# 3. All points in the area

# points are like [(x1, y1), (x2, y2), ... ]
def pca(points):
    xs, ys = zip(*points)
    x = np.array(xs)
    y = np.array(ys)

    # Subtract mean from each dimension. We now have our 2xm matrix.
    x = x - np.mean(x)
    y = y - np.mean(y)
    coords = np.vstack([x, y])

    # Covariance matrix and its eigenvectors and eigenvalues
    cov = np.cov(coords)
    vals, evecs = np.linalg.eig(cov)

    # Sort eigenvalues in decreasing order (we only have 2 values)
    sort_indices = np.argsort(vals)[::-1]

    evec1, evec2 = evecs[:, sort_indices]  # Eigenvector with largest eigenvalue
    eval1, eval2 = vals[sort_indices[0]], vals[sort_indices[1]]
    print("PCA:", evec1, evec2, eval1, eval2)
    return evec1, evec2, eval1, eval2

我尝试将不同的点集传递给 pca：

仅指向区域周边
区域内点的随机样本
区域内的所有点

不过没关系，对于每一个点的选择，我的算法都可以产生以上两种模式。虽然右边那个（错误的那个）貌似出的比较多

Answer 1

错误在这一行：

evec1, evec2 = evecs[:, sort_indices]

特征向量在列中，但该分配将 evecs[:, sort_indices] 的第一行行分配给 evec1，将第二行分配给 evec2.快速解决方法是

evec1, evec2 = evecs[:, sort_indices].T

为什么主成分分析给我截然不同的结果？

Why does principal component analysis give me drastically different results?

algorithm

numpy

data-analysis

pca