使用 Pandas DataFrames 的 KMeans 聚类的数据结构

Question

我目前正在处理一些科学数据，我正在尝试对其执行聚类任务，但由于数据格式，我收到了值错误。这是 [170 行 x 7 列] 中的两个 Pandas 个数据帧。

我试过转置数据，格式化为列表，还有一个 numpy 数组。我在代码中显示的格式来自此处找到的解决方案：

#x is the y distance
x = np.empty(7, dtype = object)
x[:] = [distance_lC, distance_fC]

#y is the speed.
y = np.empty(7, dtype = object)
y[:] = [speed_lC, speed_fC]

cell_kmeans = KMeans(n_clusters = 4).fit_predict(y)

fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatterplot(cell_kmeans)
plt.show()

输出应该给出集群。但是我有以下值错误："ValueError: setting an array element with a sequence."

Answer 1

改为使用 pandas.concat 连接数据帧：

y = pandas.concat([speed_lC, speed_fC])

使用 Pandas DataFrames 的 KMeans 聚类的数据结构

Data structure for KMeans clustering using Pandas DataFrames

python

numpy

sklearn-pandas