为什么 kmeans 找不到 3 个集群?
Why doesn't kmeans find the 3 clusters?
我 运行 一个 3 维数据集上的 kmeans 并得到以下结果:
代码如下:
library(tidyr)
setwd('C:/temp/rwd')
getwd()
df <- read.table('data-1581352459203.csv',
header = TRUE,
sep = ",")
dff <- df %>% pivot_wider(names_from = SensorId, values_from = last)
data = data.frame(dff$`3`, dff$`4`, dff$`5`)
cf.kmeans <- kmeans(data, centers = 3, nstart = 20)
cf.kmeans
library(plot3D)
x <- dff$`3`
y <- dff$`4`
z <- dff$`5`
scatter3D(x, y, z,
bty ="g", pch = cf.kmeans$cluster, colvar=as.numeric(cf.kmeans$cluster),
xlab = "Temperature", ylab = "Humidity", zlab = "Speed",
ticktype = "detailed")
library("plot3Drgl")
plotrgl()
数据集看起来像这样(90 个观察):
如果能解释为什么 kmeans 找不到明显的聚类,我将不胜感激。
您的变量在不同的范围内。您需要缩放数据,否则更大规模的变量将占主导地位。请参阅下面的可重现示例:
library(plot3D)
set.seed(100)
mat = cbind(rnorm(60,rep(c(0,30,30),each=20),5),
rnorm(60,rep(c(0,30,30),each=20),5),
rnorm(60,rep(c(0,0,1),each=20),0.1)
)
clus = kmeans(mat,3,nstart = 20)
scatter3D(mat[,1],mat[,2],mat[,3],
ticktype = "detailed",colvar=clus$cluster)
以上与您的结果相似,现在做缩放:
clus=kmeans(scale(mat),3,nstart=20)
scatter3D(mat[,1],mat[,2],mat[,3],ticktype = "detailed",colvar=clus$cluster)
我 运行 一个 3 维数据集上的 kmeans 并得到以下结果:
代码如下:
library(tidyr)
setwd('C:/temp/rwd')
getwd()
df <- read.table('data-1581352459203.csv',
header = TRUE,
sep = ",")
dff <- df %>% pivot_wider(names_from = SensorId, values_from = last)
data = data.frame(dff$`3`, dff$`4`, dff$`5`)
cf.kmeans <- kmeans(data, centers = 3, nstart = 20)
cf.kmeans
library(plot3D)
x <- dff$`3`
y <- dff$`4`
z <- dff$`5`
scatter3D(x, y, z,
bty ="g", pch = cf.kmeans$cluster, colvar=as.numeric(cf.kmeans$cluster),
xlab = "Temperature", ylab = "Humidity", zlab = "Speed",
ticktype = "detailed")
library("plot3Drgl")
plotrgl()
数据集看起来像这样(90 个观察):
如果能解释为什么 kmeans 找不到明显的聚类,我将不胜感激。
您的变量在不同的范围内。您需要缩放数据,否则更大规模的变量将占主导地位。请参阅下面的可重现示例:
library(plot3D)
set.seed(100)
mat = cbind(rnorm(60,rep(c(0,30,30),each=20),5),
rnorm(60,rep(c(0,30,30),each=20),5),
rnorm(60,rep(c(0,0,1),each=20),0.1)
)
clus = kmeans(mat,3,nstart = 20)
scatter3D(mat[,1],mat[,2],mat[,3],
ticktype = "detailed",colvar=clus$cluster)
以上与您的结果相似,现在做缩放:
clus=kmeans(scale(mat),3,nstart=20)
scatter3D(mat[,1],mat[,2],mat[,3],ticktype = "detailed",colvar=clus$cluster)