ValueError: setting an array element with a sequence. on DBSCAN, no missing dimensionality
ValueError: setting an array element with a sequence. on DBSCAN, no missing dimensionality
我在一个数据集上使用 DBSCAN.fit(),该数据集实际上是一个 pandas 带有矢量化词的单列,维度数相同,均为 30。它看起来像这样:
df['column']
2 [-0.003417029886667123, -0.0016105849274073794...
3 [-0.24330333298729837, 0.48110865717035506, 0....
4 [-0.0017016271879120766, 0.01266130386650884, ...
5 [0.002174357210089775, 0.004633570752676618, 0...
6 [0.008567001972125537, 0.0012244984475515731, ...
matrix = df['column'].as_matrix()
#DBSCAN inplementation
db = DBSCAN(eps=0.06, min_samples=1)
db.fit(matrix)
clusters = db.labels_.tolist()
但是,在拟合数据后,我得到以下回溯:
----> 4 db.fit(matrix)
5 clusters = db.labels_.tolist()
/opt/conda/lib/python3.6/site-packages/sklearn/cluster/dbscan_.py in fit(self, X, y, sample_weight)
280
281 """
--> 282 X = check_array(X, accept_sparse='csr')
283 clust = dbscan(X, sample_weight=sample_weight,
284 **self.get_params())
/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:
ValueError: setting an array element with a sequence.
我了解到此错误与一个或多个数组与其他数组的长度不同有关。但是,就我而言,这似乎不是问题所在,请在下方留言:
set(np.array([m]).shape[0] for m in matrix)
>> {1}
set(np.array([m]).shape[1] for m in matrix)
>> {30}
如您所见,所有数组的长度都相同。因此可能是什么问题?
您将要素转换为数组的方式不是将其转换为数组,而是转换为列表数组,这就是您看到此错误的原因。
您可以将内部列表也转换为数组
我在一个数据集上使用 DBSCAN.fit(),该数据集实际上是一个 pandas 带有矢量化词的单列,维度数相同,均为 30。它看起来像这样:
df['column']
2 [-0.003417029886667123, -0.0016105849274073794...
3 [-0.24330333298729837, 0.48110865717035506, 0....
4 [-0.0017016271879120766, 0.01266130386650884, ...
5 [0.002174357210089775, 0.004633570752676618, 0...
6 [0.008567001972125537, 0.0012244984475515731, ...
matrix = df['column'].as_matrix()
#DBSCAN inplementation
db = DBSCAN(eps=0.06, min_samples=1)
db.fit(matrix)
clusters = db.labels_.tolist()
但是,在拟合数据后,我得到以下回溯:
----> 4 db.fit(matrix)
5 clusters = db.labels_.tolist()
/opt/conda/lib/python3.6/site-packages/sklearn/cluster/dbscan_.py in fit(self, X, y, sample_weight)
280
281 """
--> 282 X = check_array(X, accept_sparse='csr')
283 clust = dbscan(X, sample_weight=sample_weight,
284 **self.get_params())
/opt/conda/lib/python3.6/site-packages/sklearn/utils/validation.py in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, warn_on_dtype, estimator)
431 force_all_finite)
432 else:
--> 433 array = np.array(array, dtype=dtype, order=order, copy=copy)
434
435 if ensure_2d:
ValueError: setting an array element with a sequence.
我了解到此错误与一个或多个数组与其他数组的长度不同有关。但是,就我而言,这似乎不是问题所在,请在下方留言:
set(np.array([m]).shape[0] for m in matrix)
>> {1}
set(np.array([m]).shape[1] for m in matrix)
>> {30}
如您所见,所有数组的长度都相同。因此可能是什么问题?
您将要素转换为数组的方式不是将其转换为数组,而是转换为列表数组,这就是您看到此错误的原因。
您可以将内部列表也转换为数组