"fast ward" 聚集在 Python

"fast ward" clustering in Python

在 JMP 软件中,有一个选项可以在行数大于 2000 时使用 "fast Ward" 方法。来自 documentation [快病房]:

"Applies an algorithm that computes Ward's method more quickly for large numbers of rows. The computation time is shorter because this algorithm does not require the calculation of a distance matrix. It is used automatically whenever there are more than 2,000 rows."

Matlab 做同样的事情.... "Find a maximum of four clusters in a hierarchical cluster tree created using the ward linkage method. Specify 'SaveMemory' as 'on' to construct clusters without computing the distance matrix. Otherwise, you can receive an out-of-memory error if your machine does not have enough memory to hold the distance matrix."

我在 Python 中寻找类似的东西,但它们似乎都需要提前计算距离矩阵(对于我的 275k 行和 10 列的问题,这需要荒谬的内存量)。在 JMP/Matlab 中,尽管它在内存只有我想要 运行 的机器的一半的机器上工作得很好 python 脚本。有人知道吗?

来自a now-rolled-back edit to the question by the OP

I found that using the "linkage_vector" option seems to be what i was looking for. I was thrown off because "vector" to me meant 1D, but I guess it can be N-D.

你和 fastcluster 一起工作过吗?它有 "hierarchical clusters from distance matrices or from vector data"

的选项