为每个值链创建 ID 变量
Create ID variable per chain of values
我有一个如下所示的数据集:
data <- data.frame(Name1 = c("A", "B", "D", "E", "H"),
Name2 = c("B", "C", "E", "G", "I"))
我想添加一个 ID 列来帮助我跟踪名称组,即谁引用了谁?因此,对于示例数据,组将是:
Name1 Name2 GroupID
A B 1
B C 1
D E 2
E G 2
H I 3
请注意,我的原始数据未按此示例排序。在此先感谢您的帮助!
您可以使用 igraph
包从您的数据集创建网络并确定集群:
data <- data.frame(Name1 = c("A", "B", "D", "E", "H"),
Name2 = c("B", "C", "E", "G", "I"))
library(igraph)
graph <- graph_from_data_frame(data, directed = FALSE)
clusters <- components(graph)
#data$GroupId <- sapply(data$Name1, function(x) clusters$membership[which(names(clusters$membership) == x)])
# Simpler version
data$GroupId <- clusters$membership[data$Name1]
这给出:
> data
Name1 Name2 GroupId
1 A B 1
2 B C 1
3 D E 2
4 E G 2
5 H I 3
我有一个如下所示的数据集:
data <- data.frame(Name1 = c("A", "B", "D", "E", "H"),
Name2 = c("B", "C", "E", "G", "I"))
我想添加一个 ID 列来帮助我跟踪名称组,即谁引用了谁?因此,对于示例数据,组将是:
Name1 Name2 GroupID
A B 1
B C 1
D E 2
E G 2
H I 3
请注意,我的原始数据未按此示例排序。在此先感谢您的帮助!
您可以使用 igraph
包从您的数据集创建网络并确定集群:
data <- data.frame(Name1 = c("A", "B", "D", "E", "H"),
Name2 = c("B", "C", "E", "G", "I"))
library(igraph)
graph <- graph_from_data_frame(data, directed = FALSE)
clusters <- components(graph)
#data$GroupId <- sapply(data$Name1, function(x) clusters$membership[which(names(clusters$membership) == x)])
# Simpler version
data$GroupId <- clusters$membership[data$Name1]
这给出:
> data
Name1 Name2 GroupId
1 A B 1
2 B C 1
3 D E 2
4 E G 2
5 H I 3