矩阵中 inf 或 NaN 的 Sklearn 谱聚类错误

Sklearn Spectral Clustering error for inf or NaNs in matrix

我正在使用 Spectral Clustering Library,相似度矩阵是它的主要参数。我的矩阵看起来像:

[[  1.00000000e+00   8.47085137e-01   8.49644498e-01   8.49746438e-01
2.96473454e-01   8.50540412e-01   8.49462072e-01   8.50839475e-01
8.45951343e-01   5.76448265e-01   8.48265736e-01   8.43378943e-01
3.75348067e-01   1.17626480e-01   2.50357519e-01   8.50495202e-01
9.97541755e-01   8.49835674e-01   8.48770171e-01   8.45869271e-01
-5.97205241e-02]
[  8.47085137e-01   1.00000000e+00   9.98547894e-01   9.98803332e-01
2.22305018e-01   9.98755219e-01   9.98502380e-01   9.98402601e-01
9.98778885e-01   5.66416311e-01   9.98639207e-01   9.98452172e-01
-6.10479042e-02   2.46741344e-02  -4.14116930e-03   9.98357419e-01
8.48955204e-01   9.98525354e-01   9.98900440e-01   9.98426618e-01
-6.51839614e-02]
[  8.49644498e-01   9.98547894e-01   1.00000000e+00   9.98764222e-01
1.59017501e-01   9.98777492e-01   9.98797005e-01   9.98756310e-01
9.98785822e-01   5.71955127e-01   9.98834038e-01   9.98652820e-01
-5.95467715e-02   1.98107829e-02  -3.88527970e-03   9.98810942e-01
8.51337460e-01   9.98882675e-01   9.98815975e-01   9.98789494e-01
-6.69662309e-02]
[  8.49746438e-01   9.98803332e-01   9.98764222e-01   1.00000000e+00
4.73518047e-01   9.98684853e-01   9.98839959e-01   9.99029920e-01
9.98804479e-01   5.67855583e-01   9.98759386e-01   9.98796277e-01
-6.07517782e-02   1.71388383e-02  -3.20996100e-03   9.98669121e-01
8.51600753e-01   9.98681806e-01   9.99072484e-01   9.98702177e-01
-6.29855810e-02]
[  3.52784328e-01   2.41076867e-01   2.01621082e-01   4.11538647e-01
9.92999574e-01   2.09351787e-01   2.12464918e-01   1.84566399e-01
2.82162287e-01   8.88835155e-01   1.90613041e-01   2.12150578e-01
2.92104260e-01   6.25221827e-02   8.70607365e-01   2.88645877e-01
3.09283827e-01   2.81253950e-01   1.80307149e-01   2.49082955e-01
5.46192492e-02]
...
[ -5.97205241e-02  -6.51839614e-02  -6.69662309e-02  -6.29855810e-02
7.86918277e-02  -6.49002943e-02  -6.12003747e-02  -6.34500592e-02
-6.75593439e-02   7.23869691e-02  -6.20686862e-02  -5.94039824e-02
-1.00101778e-01  -1.14667128e-01   5.57606897e-02  -6.32884559e-02
-5.33734526e-02  -5.90822523e-02  -6.17068052e-02  -5.76615359e-02
1.00000000e+00]]

我的代码类似于文档示例:

cl = SpectralClustering(n_clusters=4,affinity='precomputed')
y = cl.fit_predict(matrix)

但是出现如下错误:

/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/utils/validation.py:629: UserWarning: Array is not symmetric, and will be converted to symmetric by average with its transpose.
  warnings.warn("Array is not symmetric, and will be converted "

/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/utils/graph.py:172: RuntimeWarning: invalid value encountered in sqrt
  w = np.sqrt(w)

Traceback (most recent call last):

File "/home/mahmood/PycharmProjects/sentence2vec/graphClustering.py", line 23, in <module>
  y = cl.fit_predict(matrix)

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/base.py", line 371, in fit_predict
  self.fit(X)

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/spectral.py", line 454, in fit
    assign_labels=self.assign_labels)

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/cluster/spectral.py", line 258, in spectral_clustering
    eigen_tol=eigen_tol, drop_first=False)

File "/usr/local/lib/python2.7/dist-packages/scikit_learn-0.17.1-py2.7-linux-x86_64.egg/sklearn/manifold/spectral_embedding_.py", line 254, in spectral_embedding
    tol=eigen_tol)

File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1545, in eigsh
    symmetric=True, tol=tol)

File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 1033, in get_OPinv_matvec
    return LuInv(A).matvec

File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/interface.py", line 142, in __new__
    obj.__init__(*args, **kwargs)

File "/usr/lib/python2.7/dist-packages/scipy/sparse/linalg/eigen/arpack/arpack.py", line 922, in __init__
    self.M_lu = lu_factor(M)

File "/usr/lib/python2.7/dist-packages/scipy/linalg/decomp_lu.py", line 58, in lu_factor
    a1 = asarray_chkfinite(a)

File "/usr/lib/python2.7/dist-packages/numpy/lib/function_base.py", line 1022, in asarray_chkfinite

"array must not contain infs or NaNs")
ValueError: array must not contain infs or NaNs

可以接受第一个警告,因为矩阵不对称,但矩阵中没有 inf 或 NaN。

出现 NaN 值因为您的矩阵不是相似性矩阵:您的数据包含负相似性!当取这些值的 sqrt 时,您会得到 NaN,因此出现错误。

这些警告不只是为了好玩 - 矩阵分解技术有一些要求才能让它们发挥作用并 return 有意义的结果。

先修正你的负面相似之处,然后重试。