MiniBatchKMeans OverflowError: cannot convert float infinity to integer?
MiniBatchKMeans OverflowError: cannot convert float infinity to integer?
我正在尝试根据使用 sklearn.cluster.MiniBatchKMeans
的剪影分数找到正确数量的聚类 k
。
from sklearn.cluster import MiniBatchKMeans
from sklearn.feature_extraction.text import HashingVectorizer
docs = ['hello monkey goodbye thank you', 'goodbye thank you hello', 'i am going home goodbye thanks', 'thank you very much sir', 'good golly i am going home finally']
vectorizer = HashingVectorizer()
X = vectorizer.fit_transform(docs)
for k in range(5):
model = MiniBatchKMeans(n_clusters = k)
model.fit(X)
我收到此错误:
Warning (from warnings module):
File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 1279
0, n_samples - 1, init_size)
DeprecationWarning: This function is deprecated. Please call randint(0, 4 + 1) instead
Traceback (most recent call last):
File "<pyshell#85>", line 3, in <module>
model.fit(X)
File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 1300, in fit
init_size=init_size)
File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 640, in _init_centroids
x_squared_norms=x_squared_norms)
File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 88, in _k_init
n_local_trials = 2 + int(np.log(n_clusters))
OverflowError: cannot convert float infinity to integer
我知道 type(k)
是 int
,所以我不知道这个问题是从哪里来的。我可以 运行 以下就好了,但我似乎无法遍历列表中的整数,即使 type(2)
等于 k = 2; type(k)
model = MiniBatchKMeans(n_clusters = 2)
model.fit(X)
甚至 运行宁不同的 model
工作:
>>> model = KMeans(n_clusters = 2)
>>> model.fit(X)
KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
n_jobs=1, precompute_distances='auto', random_state=None, tol=0.0001,
verbose=0)
让我们分析一下您的代码:
for k in range(5)
returns 顺序如下:
0, 1, 2, 3, 4
model = MiniBatchKMeans(n_clusters = k)
使用 n_clusters=k
初始化模型
- 让我们看一下第一次迭代:
n_clusters=0
已使用
- 在优化代码中(查看输出):
int(np.log(n_clusters))
- =
int(np.log(0))
- =
int(-inf)
- 错误:没有整数的无穷大定义!
- -> 无法将 -inf 的浮点值转换为 int!
设置n_clusters=0
没有意义!
我正在尝试根据使用 sklearn.cluster.MiniBatchKMeans
的剪影分数找到正确数量的聚类 k
。
from sklearn.cluster import MiniBatchKMeans
from sklearn.feature_extraction.text import HashingVectorizer
docs = ['hello monkey goodbye thank you', 'goodbye thank you hello', 'i am going home goodbye thanks', 'thank you very much sir', 'good golly i am going home finally']
vectorizer = HashingVectorizer()
X = vectorizer.fit_transform(docs)
for k in range(5):
model = MiniBatchKMeans(n_clusters = k)
model.fit(X)
我收到此错误:
Warning (from warnings module):
File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 1279
0, n_samples - 1, init_size)
DeprecationWarning: This function is deprecated. Please call randint(0, 4 + 1) instead
Traceback (most recent call last):
File "<pyshell#85>", line 3, in <module>
model.fit(X)
File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 1300, in fit
init_size=init_size)
File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 640, in _init_centroids
x_squared_norms=x_squared_norms)
File "C:\Python34\lib\site-packages\sklearn\cluster\k_means_.py", line 88, in _k_init
n_local_trials = 2 + int(np.log(n_clusters))
OverflowError: cannot convert float infinity to integer
我知道 type(k)
是 int
,所以我不知道这个问题是从哪里来的。我可以 运行 以下就好了,但我似乎无法遍历列表中的整数,即使 type(2)
等于 k = 2; type(k)
model = MiniBatchKMeans(n_clusters = 2)
model.fit(X)
甚至 运行宁不同的 model
工作:
>>> model = KMeans(n_clusters = 2)
>>> model.fit(X)
KMeans(copy_x=True, init='k-means++', max_iter=300, n_clusters=2, n_init=10,
n_jobs=1, precompute_distances='auto', random_state=None, tol=0.0001,
verbose=0)
让我们分析一下您的代码:
for k in range(5)
returns 顺序如下:0, 1, 2, 3, 4
model = MiniBatchKMeans(n_clusters = k)
使用n_clusters=k
初始化模型
- 让我们看一下第一次迭代:
n_clusters=0
已使用- 在优化代码中(查看输出):
int(np.log(n_clusters))
- =
int(np.log(0))
- =
int(-inf)
- 错误:没有整数的无穷大定义!
- -> 无法将 -inf 的浮点值转换为 int!
设置n_clusters=0
没有意义!