在 R 中找到每个组中最常见的组合

Find the most common combinations within each group in R

我有以下数据集,显示了每个产品中包含的成分;

data <- data.frame("PRODUCT" = c("Creme","Creme","Creme","Creme","Medoc","Medoc","Medoc","Medoc","Medoc","Hububu","Hububu","Hububu","Hububu","Troll","Troll","Troll","Troll","Suzuki","Suzuki","Gluglu","Gluglu","Gluglu"), 
            "INGREDIENT" = c("zeze","zaza","zozo","zuzu","zaza","sasa","haha","zuzu","zemzem","zaza","zuzu","zizi","haha","zozo","zaza","zemzem","zuzu","sasa","zuzu","ozam","zaza","hayda"))

我想知道每种产品中最常见的成分组合;哪种成分与哪种其他成分有关?我应用了在此线程中找到的代码 :

combinaisons_par_PRODUCT = data %>% 
  full_join(data, by="PRODUCT") %>% 
  group_by(INGREDIENT.x, INGREDIENT.y) %>% 
  summarise(n = length(unique(PRODUCT))) %>% 
  filter(INGREDIENT.x!=INGREDIENT.y) %>%
  mutate(item = paste(INGREDIENT.x, INGREDIENT.y, sep=", "))

它有效,但还有最后一个缺陷;我希望忽略该命令。例如,这段代码会给我 1 个 HAHA 和 SASA 关联,以及 1 个 SASA 和 HAHA 关联。但对我来说,这些都是一样的东西。所以我希望代码忽略 INGREDIENTS 的顺序,并给我一个 2 HAHA & SASA 的唯一关联。

我尝试在应用代码之前对成分进行排序,但它也没有用。有人可以帮我吗?我怎样才能让这些组合不考虑顺序?

非常感谢!

这是否符合您的要求?我仅限于组合按字母顺序排列的情况,避免重复计算。

data %>% 
  full_join(data, by="PRODUCT") %>%
  filter(INGREDIENT.x < INGREDIENT.y) %>%
  count(combo = paste(INGREDIENT.x, INGREDIENT.y, sep = ", "))

我们可以使用 base R

m1 <- crossprod(table(data))
subset(as.data.frame.table(m1 * lower.tri(m1, diag = TRUE)), Freq != 0)

编辑:@ThomasIsCoding 的评论

igraph 选项使用 graph_from_adjacency_matrix

library(igraph)

get.data.frame(
    graph_from_adjacency_matrix(
        crossprod(table(data)),
        mode = "undirected",
        weighted = TRUE
    )
)

给予

     from     to weight
1    haha   haha      2
2    haha   sasa      1
3    haha   zaza      2
4    haha zemzem      1
5    haha   zizi      1
6    haha   zuzu      2
7   hayda  hayda      1
8   hayda   ozam      1
9   hayda   zaza      1
10   ozam   ozam      1
11   ozam   zaza      1
12   sasa   sasa      2
13   sasa   zaza      1
14   sasa zemzem      1
15   sasa   zuzu      2
16   zaza   zaza      5
17   zaza zemzem      2
18   zaza   zeze      1
19   zaza   zizi      1
20   zaza   zozo      2
21   zaza   zuzu      4
22 zemzem zemzem      2
23 zemzem   zozo      1
24 zemzem   zuzu      2
25   zeze   zeze      1
26   zeze   zozo      1
27   zeze   zuzu      1
28   zizi   zizi      1
29   zizi   zuzu      1
30   zozo   zozo      2
31   zozo   zuzu      2
32   zuzu   zuzu      5