在 R 程序中破坏我的数据的 c 均值聚类的问题
Problem with Ruining c-mean clustering on my data in R program
对于这个数据如何解决这个问题
> x=data.frame(c(v1="a" ,"b" ,"c" ,"d" ,"e"),
+ v2=c(97 ,90 ,93 ,97 ,90),
+ v3=c( 85 ,91 ,87 ,91 ,93))
> library(e1071)
> f <- cmeans(x, 2)
Error in cmeans(x, 2) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In cmeans(x, 2) : NAs introduced by coercion
2: In cmeans(x, 2) : NAs introduced by coercion
> f
我想将 c-maen 应用于我的数据,如上面的代码所示,它包含三个向量:v1、v2、v2 我想通过向量 (v1) 应用 c-mean 标签
如果我们查看 ?cmeans
、
的文档
x - The data matrix where columns correspond to variables and rows to observations.
因此,我们可以在删除字符列(第 1 列)
后将 data.frame
转换为 matrix
x1 <- as.matrix(x[-1])
row.names(x1) <- x[,1]
cmeans(x1, 2)
#Fuzzy c-means clustering with 2 clusters
#Cluster centers:
# v2 v3
#1 90.30090 91.85191
#2 95.75436 87.22535
#Memberships:
# 1 2
#a 0.06614213 0.93385787
#b 0.98305641 0.01694359
#c 0.19855988 0.80144012
#d 0.25730888 0.74269112
#e 0.97924422 0.02075578
#Closest hard clustering:
#a b c d e
#2 1 2 2 1
#Available components:
#[1] "centers" "size" "cluster" "membership" "iter" "withinerror" "call"
k-mean
分区聚类算法系列基于 mean
的原理,其本质上只接受数值。您收到错误是因为数据框包含数值和分类值,c-mean()
不喜欢。此外,无需将数据帧转换为矩阵,因为这不是实际问题。
因此,
替代方法
将字符变量离散化为其赋值,然后应用聚类。这样就不需要删除任何变量。
# create empty data frame
df<- setNames(data.frame(matrix(ncol = 5, nrow = 5)), c("a" ,"b" ,"c" ,"d" ,"e"))
# fill values
df$a<- c("aaaa" ,"bbbb" ,"cccc" ,"dddd" ,"eeee")
df$b<- c(97 ,90 ,93 ,97 ,90)
df$c<- c(97 ,90 ,93 ,97 ,90)
df$d<- c( 85 ,91 ,87 ,91 ,93)
df$e<- c( 85 ,91 ,87 ,91 ,93)
# show the dataframe
df
a b c d e
1 aaaa 97 97 85 85
2 bbbb 90 90 91 91
3 cccc 93 93 87 87
4 dddd 97 97 91 91
5 eeee 90 90 93 93
# Discretize the character variable
df$a <- as.numeric( factor(df$a) ) -1
df
a b c d e
1 0 97 97 85 85
2 1 90 90 91 91
3 2 93 93 87 87
4 3 97 97 91 91
5 4 90 90 93 93
# Apply clustering
library(e1071)
cmeans(df, 2)
Fuzzy c-means clustering with 2 clusters
Cluster centers:
a b c d e
1 1.406 95.72 95.72 87.18 87.18
2 2.510 90.36 90.36 91.85 91.85
Memberships:
1 2
[1,] 0.92728 0.07272
[2,] 0.04014 0.95986
[3,] 0.80061 0.19939
[4,] 0.72009 0.27991
[5,] 0.03544 0.96456
Closest hard clustering:
[1] 1 2 1 1 2
Available components:
[1] "centers" "size" "cluster" "membership" "iter"
[6] "withinerror" "call"
对于这个数据如何解决这个问题
> x=data.frame(c(v1="a" ,"b" ,"c" ,"d" ,"e"),
+ v2=c(97 ,90 ,93 ,97 ,90),
+ v3=c( 85 ,91 ,87 ,91 ,93))
> library(e1071)
> f <- cmeans(x, 2)
Error in cmeans(x, 2) : NA/NaN/Inf in foreign function call (arg 1)
In addition: Warning messages:
1: In cmeans(x, 2) : NAs introduced by coercion
2: In cmeans(x, 2) : NAs introduced by coercion
> f
我想将 c-maen 应用于我的数据,如上面的代码所示,它包含三个向量:v1、v2、v2 我想通过向量 (v1) 应用 c-mean 标签
如果我们查看 ?cmeans
、
x - The data matrix where columns correspond to variables and rows to observations.
因此,我们可以在删除字符列(第 1 列)
后将data.frame
转换为 matrix
x1 <- as.matrix(x[-1])
row.names(x1) <- x[,1]
cmeans(x1, 2)
#Fuzzy c-means clustering with 2 clusters
#Cluster centers:
# v2 v3
#1 90.30090 91.85191
#2 95.75436 87.22535
#Memberships:
# 1 2
#a 0.06614213 0.93385787
#b 0.98305641 0.01694359
#c 0.19855988 0.80144012
#d 0.25730888 0.74269112
#e 0.97924422 0.02075578
#Closest hard clustering:
#a b c d e
#2 1 2 2 1
#Available components:
#[1] "centers" "size" "cluster" "membership" "iter" "withinerror" "call"
k-mean
分区聚类算法系列基于 mean
的原理,其本质上只接受数值。您收到错误是因为数据框包含数值和分类值,c-mean()
不喜欢。此外,无需将数据帧转换为矩阵,因为这不是实际问题。
因此,
替代方法
将字符变量离散化为其赋值,然后应用聚类。这样就不需要删除任何变量。
# create empty data frame
df<- setNames(data.frame(matrix(ncol = 5, nrow = 5)), c("a" ,"b" ,"c" ,"d" ,"e"))
# fill values
df$a<- c("aaaa" ,"bbbb" ,"cccc" ,"dddd" ,"eeee")
df$b<- c(97 ,90 ,93 ,97 ,90)
df$c<- c(97 ,90 ,93 ,97 ,90)
df$d<- c( 85 ,91 ,87 ,91 ,93)
df$e<- c( 85 ,91 ,87 ,91 ,93)
# show the dataframe
df
a b c d e
1 aaaa 97 97 85 85
2 bbbb 90 90 91 91
3 cccc 93 93 87 87
4 dddd 97 97 91 91
5 eeee 90 90 93 93
# Discretize the character variable
df$a <- as.numeric( factor(df$a) ) -1
df
a b c d e
1 0 97 97 85 85
2 1 90 90 91 91
3 2 93 93 87 87
4 3 97 97 91 91
5 4 90 90 93 93
# Apply clustering
library(e1071)
cmeans(df, 2)
Fuzzy c-means clustering with 2 clusters
Cluster centers:
a b c d e
1 1.406 95.72 95.72 87.18 87.18
2 2.510 90.36 90.36 91.85 91.85
Memberships:
1 2
[1,] 0.92728 0.07272
[2,] 0.04014 0.95986
[3,] 0.80061 0.19939
[4,] 0.72009 0.27991
[5,] 0.03544 0.96456
Closest hard clustering:
[1] 1 2 1 1 2
Available components:
[1] "centers" "size" "cluster" "membership" "iter"
[6] "withinerror" "call"