集群化,Mclust(),提取集群 - R
Clusterization, Mclust(), extracting the clusters - R
我正在使用 mclust::Mclust()
函数对一个小数据集进行聚类。但是,我正在为要放入数据集中的每个数据提取聚类 class化而苦苦挣扎。
这是数据:
df <- structure(list(latitud = c(-43.8189010620117, -34.2731018066406,
-47.0666999816895, -35.7543983459473, -47.1413993835449, -36.6260986328125,
-37.2118988037109, -33.3086013793945, -37.2792015075684, -35.4524993896484,
-36.5856018066406, -44.6591987609863, -28.6996994018555, -48.1591987609863,
-45.4000015258789, -29.94580078125, -30.4386005401611, -31.6646995544434,
-51.2000007629395, -51.3328018188477, -51.25, -45.551700592041,
-39.0144004821777, -38.6081008911133, -34.9844017028809, -32.8403015136719,
-29.9953002929688, -18.3999996185303, -35.6169013977051, -35.9085998535156,
-35.4068984985352, -32.7571983337402, -32.8502998352051, -33.5938987731934,
-38.4303016662598, -38.6866989135742, -45.4057998657227, -37.5503005981445,
-37.8997001647949, -38.0368995666504, -37.7047004699707, -37.7963981628418,
-37.7092018127441, -31.5835990905762, -30.9242000579834, -38.2008018493652,
-31.6881008148193, -31.8117008209229, -27.9747009277344, -30.7047004699707,
-36.6500015258789, -34.4921989440918, -34.6581001281738, -47.3499984741211,
-47.5, -33.7219009399414, -33.6613998413086, -35.5574989318848
), longitud = c(-72.38330078125, -71.371696472168, -72.8000030517578,
-71.0864028930664, -72.7257995605469, -72.4891967773438, -72.3242034912109,
-70.3572006225586, -71.9847030639648, -71.7332992553711, -71.5255966186523,
-71.8082962036133, -70.5500030517578, -73.0888977050781, -72.5999984741211,
-70.5327987670898, -71.002197265625, -71.2546997070312, -72.9332962036133,
-73.1091995239258, -72.5167007446289, -72.0680999755859, -73.0828018188477,
-72.8478012084961, -72.0100021362305, -71.0255966186523, -70.5867004394531,
-70.3000030517578, -71.7677993774414, -71.2981033325195, -72.2082977294922,
-70.736701965332, -70.5093994140625, -70.3792037963867, -72.0105972290039,
-72.502799987793, -72.6231002807617, -72.5903015136719, -71.6239013671875,
-71.4781036376953, -71.7683029174805, -71.6988983154297, -71.823600769043,
-71.4606018066406, -70.7731018066406, -71.2988967895508, -71.2658004760742,
-70.9302978515625, -69.997802734375, -70.9244003295898, -72.4499969482422,
-71.3731002807617, -71.3019027709961, -72.8499984741211, -72.9749984741211,
-71.5550003051758, -71.3371963500977, -71.7067031860352)), row.names = c(NA,
-58L), class = c("tbl_df", "tbl", "data.frame"))
聚类:
d_clust <- Mclust(df)
现在,当我 运行 plot(d_clust)
它显示所有图表和所有内容。但它没有告诉我每一行对应哪个集群。我查看了文档,但其他文档 (1, 2, 3) and also the Whosebug questions related to Mclust()
(1, 2) 没有满足我的问题。
我正在寻找这样的东西:
| latitud | longitud | cluster_id |
对了,我做的时候class(d_clust)
是一个Mclust
class。如果你单独 运行 d_clust
不给你 table/dataframe 来绘制,怎么可能绘制 d_clust
?
当您 运行 Mclust 时,它会尝试不同的模型和不同的 G(簇数)值。所以一定要检查 BIC 图:
因为 Mclust 只会选择基于 BIC 的最佳模型,并将其保留为 d_clust$modelName 和 d_clus$G。
一旦你知道使用的是什么模型(我认为它的 EVE 和 G=4 适合你的情况),分类就有意义了,你可以简单地使用:
d_clust$classification
# or
results = data.frame(df,cluster=d_clust$classification)
head(results)
latitud longitud cluster
1 -43.8189 -72.3833 1
2 -34.2731 -71.3717 2
3 -47.0667 -72.8000 1
4 -35.7544 -71.0864 3
5 -47.1414 -72.7258 1
6 -36.6261 -72.4892 3
您还可以绘制:
with(results,plot(latitud,longitud,col=factor(cluster)))
然后你可以考虑聚类是否有意义,例如,你是否应该使用 G=4..
我正在使用 mclust::Mclust()
函数对一个小数据集进行聚类。但是,我正在为要放入数据集中的每个数据提取聚类 class化而苦苦挣扎。
这是数据:
df <- structure(list(latitud = c(-43.8189010620117, -34.2731018066406,
-47.0666999816895, -35.7543983459473, -47.1413993835449, -36.6260986328125,
-37.2118988037109, -33.3086013793945, -37.2792015075684, -35.4524993896484,
-36.5856018066406, -44.6591987609863, -28.6996994018555, -48.1591987609863,
-45.4000015258789, -29.94580078125, -30.4386005401611, -31.6646995544434,
-51.2000007629395, -51.3328018188477, -51.25, -45.551700592041,
-39.0144004821777, -38.6081008911133, -34.9844017028809, -32.8403015136719,
-29.9953002929688, -18.3999996185303, -35.6169013977051, -35.9085998535156,
-35.4068984985352, -32.7571983337402, -32.8502998352051, -33.5938987731934,
-38.4303016662598, -38.6866989135742, -45.4057998657227, -37.5503005981445,
-37.8997001647949, -38.0368995666504, -37.7047004699707, -37.7963981628418,
-37.7092018127441, -31.5835990905762, -30.9242000579834, -38.2008018493652,
-31.6881008148193, -31.8117008209229, -27.9747009277344, -30.7047004699707,
-36.6500015258789, -34.4921989440918, -34.6581001281738, -47.3499984741211,
-47.5, -33.7219009399414, -33.6613998413086, -35.5574989318848
), longitud = c(-72.38330078125, -71.371696472168, -72.8000030517578,
-71.0864028930664, -72.7257995605469, -72.4891967773438, -72.3242034912109,
-70.3572006225586, -71.9847030639648, -71.7332992553711, -71.5255966186523,
-71.8082962036133, -70.5500030517578, -73.0888977050781, -72.5999984741211,
-70.5327987670898, -71.002197265625, -71.2546997070312, -72.9332962036133,
-73.1091995239258, -72.5167007446289, -72.0680999755859, -73.0828018188477,
-72.8478012084961, -72.0100021362305, -71.0255966186523, -70.5867004394531,
-70.3000030517578, -71.7677993774414, -71.2981033325195, -72.2082977294922,
-70.736701965332, -70.5093994140625, -70.3792037963867, -72.0105972290039,
-72.502799987793, -72.6231002807617, -72.5903015136719, -71.6239013671875,
-71.4781036376953, -71.7683029174805, -71.6988983154297, -71.823600769043,
-71.4606018066406, -70.7731018066406, -71.2988967895508, -71.2658004760742,
-70.9302978515625, -69.997802734375, -70.9244003295898, -72.4499969482422,
-71.3731002807617, -71.3019027709961, -72.8499984741211, -72.9749984741211,
-71.5550003051758, -71.3371963500977, -71.7067031860352)), row.names = c(NA,
-58L), class = c("tbl_df", "tbl", "data.frame"))
聚类:
d_clust <- Mclust(df)
现在,当我 运行 plot(d_clust)
它显示所有图表和所有内容。但它没有告诉我每一行对应哪个集群。我查看了文档,但其他文档 (1, 2, 3) and also the Whosebug questions related to Mclust()
(1, 2) 没有满足我的问题。
我正在寻找这样的东西:
| latitud | longitud | cluster_id |
对了,我做的时候class(d_clust)
是一个Mclust
class。如果你单独 运行 d_clust
不给你 table/dataframe 来绘制,怎么可能绘制 d_clust
?
当您 运行 Mclust 时,它会尝试不同的模型和不同的 G(簇数)值。所以一定要检查 BIC 图:
因为 Mclust 只会选择基于 BIC 的最佳模型,并将其保留为 d_clust$modelName 和 d_clus$G。
一旦你知道使用的是什么模型(我认为它的 EVE 和 G=4 适合你的情况),分类就有意义了,你可以简单地使用:
d_clust$classification
# or
results = data.frame(df,cluster=d_clust$classification)
head(results)
latitud longitud cluster
1 -43.8189 -72.3833 1
2 -34.2731 -71.3717 2
3 -47.0667 -72.8000 1
4 -35.7544 -71.0864 3
5 -47.1414 -72.7258 1
6 -36.6261 -72.4892 3
您还可以绘制:
with(results,plot(latitud,longitud,col=factor(cluster)))
然后你可以考虑聚类是否有意义,例如,你是否应该使用 G=4..