按组聚类
Clustering by groups
如何按组进行聚类?例如,以 Kaggle 上的 this Pokemon 数据集为例。
此数据集的示例如下所示(更改了一些字段以模仿我的数据):
Name Type I Type II
Bulbasaur Grass Poison
Bulbasaur 2 Grass Poison
Venusaur Grass Not Null
VenusaurMega Venusaur Grass Not Null
...
Charizard Fire Flying
CharizardMega Charizard X Fire Dragon
假设我的数据集中没有空值,我如何分别按类型 I 和类型 II 列分组,然后按名称之间的相似性进行聚类?
输出应该是这样的:
Name Type I Type II Cluster
Bulbasaur Grass Poison 1
Bulbasaur 2 Grass Poison 1
Venusaur Grass Not Null 2
VenusaurMega Venusaur Grass Not Null 2
...
Charizard Fire Flying 3
CharizardMega Charizard X Fire Dragon 4
我尝试了一种类似于 所示的方法,但它不适用于我正在使用的 NbClust 函数。
clust <- NbClust(data, diss= string_dist, distance=NULL, min.nc = 2, max.nc = 125, method="ward.D2", index="ch")
您可以使用:rleid
来自 library(data.table)
。
df <- fread("
#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
")
编辑:(查看评论)
setDT(df, key=c("Type 1","Type 2"))[, Cluster:=.GRP, by = key(df)][]
我们可以使用base R
df$cluster <- with(df, match(`Type II`, unique(`Type II`)))
如何按组进行聚类?例如,以 Kaggle 上的 this Pokemon 数据集为例。
此数据集的示例如下所示(更改了一些字段以模仿我的数据):
Name Type I Type II
Bulbasaur Grass Poison
Bulbasaur 2 Grass Poison
Venusaur Grass Not Null
VenusaurMega Venusaur Grass Not Null
...
Charizard Fire Flying
CharizardMega Charizard X Fire Dragon
假设我的数据集中没有空值,我如何分别按类型 I 和类型 II 列分组,然后按名称之间的相似性进行聚类?
输出应该是这样的:
Name Type I Type II Cluster
Bulbasaur Grass Poison 1
Bulbasaur 2 Grass Poison 1
Venusaur Grass Not Null 2
VenusaurMega Venusaur Grass Not Null 2
...
Charizard Fire Flying 3
CharizardMega Charizard X Fire Dragon 4
我尝试了一种类似于
clust <- NbClust(data, diss= string_dist, distance=NULL, min.nc = 2, max.nc = 125, method="ward.D2", index="ch")
您可以使用:rleid
来自 library(data.table)
。
df <- fread("
#,Name,Type 1,Type 2,Total,HP,Attack,Defense,Sp. Atk,Sp. Def,Speed,Generation,Legendary
1,Bulbasaur,Grass,Poison,318,45,49,49,65,65,45,1,False
2,Ivysaur,Grass,Poison,405,60,62,63,80,80,60,1,False
3,Venusaur,Grass,Poison,525,80,82,83,100,100,80,1,False
3,VenusaurMega Venusaur,Grass,Poison,625,80,100,123,122,120,80,1,False
4,Charmander,Fire,,309,39,52,43,60,50,65,1,False
5,Charmeleon,Fire,,405,58,64,58,80,65,80,1,False
")
编辑:(查看评论)
setDT(df, key=c("Type 1","Type 2"))[, Cluster:=.GRP, by = key(df)][]
我们可以使用base R
df$cluster <- with(df, match(`Type II`, unique(`Type II`)))