如何根据 R 中的后续字符串值找到不同变量之间的相似性？

Question

我有一个结构如下的 DF:

 X  Y  Z 
 D  E  1
 D  F  2
 D  G  3
 L  E  1
 L  F  2
 L  G  3
 M  N  4
 M  O  5
 S  N  4
 S  O  5

我想根据它们共有的第二列值获得两个不同的簇（"L - D"，"M - S"）。因此，输出的结构如下：

 Clust.1   Clust.2
    L         M
    D         S

我该怎么办？

感谢您的建议！

Answer 1

这是来自 tidyverse、

的想法

df %>% 
 group_by(X) %>% 
 summarise(Z = toString(Z)) %>% 
 group_by(Z) %>% 
 mutate(new = seq(n())) %>% 
 spread(Z, X)

这给出了，

# A tibble: 2 x 3
    new `1, 2, 3` `4, 5`
* <int>    <fctr> <fctr>
1     1         D      M
2     2         L      S

How to find similarity between different variables according to consequent string values in R?