如何动态创建变量并将其组合到 r 中的数据框?

How to dynamically create variables and combine it to the dataframe in r?

我 运行 kmeans 多个 number of clusters 然后尝试 combine cluster resultsoriginal dataframe.

来自 post https://stats.stackexchange.com/questions/10838/produce-a-list-of-variable-name-in-a-for-loop-then-assign-values-to-the 我正在使用他们下面提到的代码 动态创建变量 并根据我的需要进行修改。

原代码在上面post:

x <- as.list(rnorm(10000))
names(x) <- paste("a", 1:length(x), sep = "")
list2env(x , envir = .GlobalEnv)

现在将其应用于 iris 数据:

library(tidyverse)
library(ggthemes)
library(factoextra)

这在创建 3 个集群列表时效果很好:

# running for 1 to 3 clusters
lapply(1:3,

function(cluster_num){
  cluster_res_list <- as.list(kmeans(iris %>% select(-Species), cluster_num, nstart = 25)) 
  names(cluster_res_list) <- paste("iris_clus", 1:length(cluster_res_list), sep="_")
  list2env(cluster_res_list, envir = .GlobalEnv)
 
 # iris_df <- cbind(iris, cluster_res_list)
} )

问题: 当我尝试将它们与原始数据集组合时出现错误:Error in as.data.frame.default(x[[i]], optional = TRUE, stringsAsFactors = stringsAsFactors) : cannot coerce class ‘"kmeans"’ to a data.frame

lapply(1:3,

function(cluster_num){
  cluster_res_list <- as.list(kmeans(iris %>% select(-Species), cluster_num, nstart = 25)) 
  names(cluster_res_list) <- paste("iris_clus", 1:length(cluster_res_list), sep="_")
  list2env(cluster_res_list, envir = .GlobalEnv)
 
  # to combine each cluster result to original df
  iris_df <- cbind(iris, cluster_res_list)
} )

可以使用 fitted 函数将 kmeans 的输出视为矩阵。矩阵的行名称标识集群。如果您想在原始日期框架中添加一列来标识集群分配,那么类似的方法就可以了。

以3个集群为例:

cluster_num <- 3

iris %>% 
    select(-Species) %>% 
    kmeans(centers = cluster_num, nstart = 25) %>% 
    fitted() %>% 
    row.names() %>%
    tibble(iris_clus = .) %>%
    cbind(iris) %>% 
    tail()

    iris_clus Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
145         2          6.7         3.3          5.7         2.5 virginica
146         2          6.7         3.0          5.2         2.3 virginica
147         1          6.3         2.5          5.0         1.9 virginica
148         2          6.5         3.0          5.2         2.0 virginica
149         2          6.2         3.4          5.4         2.3 virginica
150         1          5.9         3.0          5.1         1.8 virginica

将其插入示例中的 lapply

lapply(1:3, function(cluster_num) {
    iris %>% 
        select(-Species) %>% 
        kmeans(centers = cluster_num, nstart = 25) %>% 
        fitted() %>% 
        row.names() %>%
        tibble(iris_clus = .) %>%
        cbind(iris) 
})

这是将所有内容合并到一个数据集中的一种方法。每个模型一列

clusters <- Reduce(cbind, lapply(1:3, function(cluster_num) {

   result <- iris %>% 
        select(-Species) %>% 
        kmeans(centers = cluster_num, nstart = 25) %>% 
        fitted() %>% 
        row.names() %>% 
        tibble(iris_clus = .)

   names(result) <- paste("iris_clus", cluster_num, sep = "_")
   return(result)

}))

cbind(iris, clusters)