聚类,Mclust(),提取聚类-R

克里斯

我正在使用mclust::Mclust()函数对一个小的数据集进行聚类。但是,我正在努力为每个要提取到数据集中的数据提取聚类分类。

数据如下:

df <- structure(list(latitud = c(-43.8189010620117, -34.2731018066406, 
-47.0666999816895, -35.7543983459473, -47.1413993835449, -36.6260986328125, 
-37.2118988037109, -33.3086013793945, -37.2792015075684, -35.4524993896484, 
-36.5856018066406, -44.6591987609863, -28.6996994018555, -48.1591987609863, 
-45.4000015258789, -29.94580078125, -30.4386005401611, -31.6646995544434, 
-51.2000007629395, -51.3328018188477, -51.25, -45.551700592041, 
-39.0144004821777, -38.6081008911133, -34.9844017028809, -32.8403015136719, 
-29.9953002929688, -18.3999996185303, -35.6169013977051, -35.9085998535156, 
-35.4068984985352, -32.7571983337402, -32.8502998352051, -33.5938987731934, 
-38.4303016662598, -38.6866989135742, -45.4057998657227, -37.5503005981445, 
-37.8997001647949, -38.0368995666504, -37.7047004699707, -37.7963981628418, 
-37.7092018127441, -31.5835990905762, -30.9242000579834, -38.2008018493652, 
-31.6881008148193, -31.8117008209229, -27.9747009277344, -30.7047004699707, 
-36.6500015258789, -34.4921989440918, -34.6581001281738, -47.3499984741211, 
-47.5, -33.7219009399414, -33.6613998413086, -35.5574989318848
), longitud = c(-72.38330078125, -71.371696472168, -72.8000030517578, 
-71.0864028930664, -72.7257995605469, -72.4891967773438, -72.3242034912109, 
-70.3572006225586, -71.9847030639648, -71.7332992553711, -71.5255966186523, 
-71.8082962036133, -70.5500030517578, -73.0888977050781, -72.5999984741211, 
-70.5327987670898, -71.002197265625, -71.2546997070312, -72.9332962036133, 
-73.1091995239258, -72.5167007446289, -72.0680999755859, -73.0828018188477, 
-72.8478012084961, -72.0100021362305, -71.0255966186523, -70.5867004394531, 
-70.3000030517578, -71.7677993774414, -71.2981033325195, -72.2082977294922, 
-70.736701965332, -70.5093994140625, -70.3792037963867, -72.0105972290039, 
-72.502799987793, -72.6231002807617, -72.5903015136719, -71.6239013671875, 
-71.4781036376953, -71.7683029174805, -71.6988983154297, -71.823600769043, 
-71.4606018066406, -70.7731018066406, -71.2988967895508, -71.2658004760742, 
-70.9302978515625, -69.997802734375, -70.9244003295898, -72.4499969482422, 
-71.3731002807617, -71.3019027709961, -72.8499984741211, -72.9749984741211, 
-71.5550003051758, -71.3371963500977, -71.7067031860352)), row.names = c(NA, 
-58L), class = c("tbl_df", "tbl", "data.frame"))

聚类:

d_clust <- Mclust(df)

现在,当我运行plot(d_clust)时,它会显示所有图形和所有内容。但这并没有告诉我哪个集群对应于每一行。我已经调查的文档和其他(123),还涉及到计算器的问题Mclust()12)不履行我的问题。

我正在寻找这样的东西:

| latitud | longitud | cluster_id |

顺便说一句,当我这样做class(d_clust)是一个Mclust类。这怎么可能绘制d_clust时,如果你运行d_clust单独它不会给你一个表/数据帧到情节?

笨狼

当您运行Mclust时,它将尝试使用不同的模型和不同的G值(簇数)。因此,请查看BIC图:

在此处输入图片说明

因为Mclust将仅基于BIC选择最佳模型,并将其保留为d_clust $ modelName和d_clus $ G。

一旦知道了使用哪种模型(我认为您的情况为EVE和G = 4),分类就很有意义了,您可以简单地使用以下方法将其取出:

d_clust$classification
# or
results = data.frame(df,cluster=d_clust$classification)
head(results)
   latitud longitud cluster
1 -43.8189 -72.3833       1
2 -34.2731 -71.3717       2
3 -47.0667 -72.8000       1
4 -35.7544 -71.0864       3
5 -47.1414 -72.7258       1
6 -36.6261 -72.4892       3

您还可以绘制:

with(results,plot(latitud,longitud,col=factor(cluster)))

在此处输入图片说明

您可以考虑是否进行聚类,例如,是否应使用G = 4。

本文收集自互联网,转载请注明来源。

如有侵权,请联系 [email protected] 删除。

编辑于
0

我来说两句

0 条评论
登录 后参与评论

相关文章