为每个值链创建 ID 变量

Create ID variable per chain of values

我有一个如下所示的数据集:

data <- data.frame(Name1 = c("A", "B", "D", "E", "H"),
                   Name2 = c("B", "C", "E", "G", "I"))

我想添加一个 ID 列来帮助我跟踪名称组,即谁引用了谁?因此,对于示例数据,组将是:

  Name1 Name2 GroupID
      A     B       1
      B     C       1
      D     E       2
      E     G       2
      H     I       3

请注意,我的原始数据未按此示例排序。在此先感谢您的帮助!

您可以使用 igraph 包从您的数据集创建网络并确定集群:

data <- data.frame(Name1 = c("A", "B", "D", "E", "H"),
                   Name2 = c("B", "C", "E", "G", "I"))


library(igraph)
graph <- graph_from_data_frame(data, directed = FALSE)
clusters <- components(graph)

#data$GroupId <- sapply(data$Name1, function(x) clusters$membership[which(names(clusters$membership) == x)])
# Simpler version
data$GroupId <- clusters$membership[data$Name1]

这给出:

> data
  Name1 Name2 GroupId
1     A     B       1
2     B     C       1
3     D     E       2
4     E     G       2
5     H     I       3