合并具有相同相关值的行中的变量
Merge variables in a row with same correlation value
我有一个带有变量之间相关值的 df:
input <- data.frame(Var1 = c("A","B","A","D","G", "H", "I"), Var2 =
c("B","C","E","F", "F", "J", "K"), Corr_Value = c(1,1,1,0.7,0.7, 1,1),
stringsAsFactors = F)
如图所示,(A = B)、(A = E) 和 (B = C)
我想得到一个像'output'这样的df,其中所有变量关系都在一行中(A = B = C = E)
output <- data.frame(Var1 = c("A","D", "H", "I"),
Var2 = c("B","F", "J", "K"),
Var3 = c("C","G", "NA", "NA"),
Var4 = c("E", "NA", "NA", "NA"),
Corr_Value = c(1,0.7,1,1))
我该怎么做?
我们可以使用igraph
得到预期的输出
library(igraph)
g1 <- graph.data.frame(input[-3], directed = TRUE)
cl <- clusters(g1)
lst <- cluster_edge_betweenness(g1)
lst1 <- lst[1:4]
attr(lst, "class") <- NULL
out <- setNames(do.call(rbind.data.frame, lapply(lst1,
`length<-`, max(lengths(lst1)))), paste0("Var", 1:4))
out$Corr_value <- input$Corr_Value[match(mapply(`[`, lst1, cl$csize), input$Var2)]
out
# Var1 Var2 Var3 Var4 Corr_value
#1 A B C E 1.0
#2 D G F <NA> 0.7
#3 H J <NA> <NA> 1.0
#4 I K <NA> <NA> 1.0
我有一个带有变量之间相关值的 df:
input <- data.frame(Var1 = c("A","B","A","D","G", "H", "I"), Var2 =
c("B","C","E","F", "F", "J", "K"), Corr_Value = c(1,1,1,0.7,0.7, 1,1),
stringsAsFactors = F)
如图所示,(A = B)、(A = E) 和 (B = C)
我想得到一个像'output'这样的df,其中所有变量关系都在一行中(A = B = C = E)
output <- data.frame(Var1 = c("A","D", "H", "I"),
Var2 = c("B","F", "J", "K"),
Var3 = c("C","G", "NA", "NA"),
Var4 = c("E", "NA", "NA", "NA"),
Corr_Value = c(1,0.7,1,1))
我该怎么做?
我们可以使用igraph
得到预期的输出
library(igraph)
g1 <- graph.data.frame(input[-3], directed = TRUE)
cl <- clusters(g1)
lst <- cluster_edge_betweenness(g1)
lst1 <- lst[1:4]
attr(lst, "class") <- NULL
out <- setNames(do.call(rbind.data.frame, lapply(lst1,
`length<-`, max(lengths(lst1)))), paste0("Var", 1:4))
out$Corr_value <- input$Corr_Value[match(mapply(`[`, lst1, cl$csize), input$Var2)]
out
# Var1 Var2 Var3 Var4 Corr_value
#1 A B C E 1.0
#2 D G F <NA> 0.7
#3 H J <NA> <NA> 1.0
#4 I K <NA> <NA> 1.0