计算包含 R 中列表的矩阵的相同列?
Count identical columns of a matrix containing lists in R?
我正在处理的数据是一个矩阵,其中矩阵中的每一列都是一个包含 2 个元素的列表。我想做的是计算有多少列是相同的。
我正在从 tidygraph
对象中提取列表矩阵。我下面的例子应该能更好地解释我的问题。首先,我创建了一些数据,将它们变成 tidygraph
个对象并将它们全部放入一个列表中,如下所示:
library(ggraph)
library(tidygraph)
# create some nodes and edges data
nodes <- data.frame(name = c("x4", NA, NA))
nodes1 <- data.frame(name = c("x4", "x2", NA, NA, "x1", NA, NA))
nodes2 <- data.frame(name = c("x1", NA, NA))
nodes3 <- data.frame(name = c("x6", NA, NA))
nodes4 <- data.frame(name = c("x10", "x3", NA, NA, NA))
nodes5 <- data.frame(name = c("x1", "x2", NA, NA, "x7", NA, NA))
edges <- data.frame(from = c(1,1), to = c(2,3))
edges1 <- data.frame(from = c(1, 2, 2, 1, 5, 5), to = c(2, 3, 4, 5, 6, 7))
edges2 <- data.frame(from = c(1,1), to = c(2,3))
edges3 <- data.frame(from = c(1,1), to = c(2,3))
edges4 <- data.frame(from = c(1,2,2,1), to = c(2,3,4,5))
edges5 <- data.frame(from = c(1, 2, 2, 1, 5, 5), to = c(2, 3, 4, 5, 6, 7))
# create the tbl_graphs
tg <- tbl_graph(nodes = nodes, edges = edges)
tg_1 <- tbl_graph(nodes = nodes1, edges = edges1)
tg_2 <- tbl_graph(nodes = nodes2, edges = edges2)
tg_3 <- tbl_graph(nodes = nodes3, edges = edges3)
tg_4 <- tbl_graph(nodes = nodes4, edges = edges4)
tg_5 <- tbl_graph(nodes = nodes5, edges = edges5)
# put into list
myList <- list(tg, tg_1, tg_2, tg_3, tg_4, tg_5)
为清楚起见,查看 myList
的其中一个元素如下所示:
myList[1]
[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 1 (active)
name
<chr>
1 x4
2 NA
3 NA
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
本质上,我想做的是遍历列表的每个元素,查看边缘数据,看看有多少是相同的。
我确定有多种方法可以做到这一点....但我尝试通过使用 tidygraph
函数来提取边缘数据并返回列表矩阵来做到这一点:
# extract just the edges data
resEdges <- sapply(myList, function(x) {
nodes <- tidygraph::activate(x, edges) %>%
tibble::as_tibble()
})
同样,为了清楚起见,查看 resEdges
中的第一列如下所示:
> resEdges[,1]
$from
[1] 1 1
$to
[1] 2 3
所以,我想做的是遍历 resEdges
的列并计算相同列的频率。
在我的示例中,只有 3 个唯一列。所以,我想要的输出看起来像这样:
> edgeFreq
# A tibble: 3 × 3
from to frequency
1 1 2 3 3
1 2 2 1 5 5 2 3 4 5 6 7 2
1 2 2 1 2 3 4 5 1
myList %>%
map_chr(~as_tibble(activate(.x, edges))%>%
map_chr(str_c, collapse = " ") %>%
toString())%>%
table()%>%
as_tibble() %>%
setNames(c("data", "frequency")) %>%
separate(data, c("From", "to"), ", ")
# A tibble: 3 x 3
From to frequency
<chr> <chr> <int>
1 1 1 2 3 3
2 1 2 2 1 5 5 2 3 4 5 6 7 2
3 1 2 2 1 2 3 4 5 1
我正在处理的数据是一个矩阵,其中矩阵中的每一列都是一个包含 2 个元素的列表。我想做的是计算有多少列是相同的。
我正在从 tidygraph
对象中提取列表矩阵。我下面的例子应该能更好地解释我的问题。首先,我创建了一些数据,将它们变成 tidygraph
个对象并将它们全部放入一个列表中,如下所示:
library(ggraph)
library(tidygraph)
# create some nodes and edges data
nodes <- data.frame(name = c("x4", NA, NA))
nodes1 <- data.frame(name = c("x4", "x2", NA, NA, "x1", NA, NA))
nodes2 <- data.frame(name = c("x1", NA, NA))
nodes3 <- data.frame(name = c("x6", NA, NA))
nodes4 <- data.frame(name = c("x10", "x3", NA, NA, NA))
nodes5 <- data.frame(name = c("x1", "x2", NA, NA, "x7", NA, NA))
edges <- data.frame(from = c(1,1), to = c(2,3))
edges1 <- data.frame(from = c(1, 2, 2, 1, 5, 5), to = c(2, 3, 4, 5, 6, 7))
edges2 <- data.frame(from = c(1,1), to = c(2,3))
edges3 <- data.frame(from = c(1,1), to = c(2,3))
edges4 <- data.frame(from = c(1,2,2,1), to = c(2,3,4,5))
edges5 <- data.frame(from = c(1, 2, 2, 1, 5, 5), to = c(2, 3, 4, 5, 6, 7))
# create the tbl_graphs
tg <- tbl_graph(nodes = nodes, edges = edges)
tg_1 <- tbl_graph(nodes = nodes1, edges = edges1)
tg_2 <- tbl_graph(nodes = nodes2, edges = edges2)
tg_3 <- tbl_graph(nodes = nodes3, edges = edges3)
tg_4 <- tbl_graph(nodes = nodes4, edges = edges4)
tg_5 <- tbl_graph(nodes = nodes5, edges = edges5)
# put into list
myList <- list(tg, tg_1, tg_2, tg_3, tg_4, tg_5)
为清楚起见,查看 myList
的其中一个元素如下所示:
myList[1]
[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 1 (active)
name
<chr>
1 x4
2 NA
3 NA
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
本质上,我想做的是遍历列表的每个元素,查看边缘数据,看看有多少是相同的。
我确定有多种方法可以做到这一点....但我尝试通过使用 tidygraph
函数来提取边缘数据并返回列表矩阵来做到这一点:
# extract just the edges data
resEdges <- sapply(myList, function(x) {
nodes <- tidygraph::activate(x, edges) %>%
tibble::as_tibble()
})
同样,为了清楚起见,查看 resEdges
中的第一列如下所示:
> resEdges[,1]
$from
[1] 1 1
$to
[1] 2 3
所以,我想做的是遍历 resEdges
的列并计算相同列的频率。
在我的示例中,只有 3 个唯一列。所以,我想要的输出看起来像这样:
> edgeFreq
# A tibble: 3 × 3
from to frequency
1 1 2 3 3
1 2 2 1 5 5 2 3 4 5 6 7 2
1 2 2 1 2 3 4 5 1
myList %>%
map_chr(~as_tibble(activate(.x, edges))%>%
map_chr(str_c, collapse = " ") %>%
toString())%>%
table()%>%
as_tibble() %>%
setNames(c("data", "frequency")) %>%
separate(data, c("From", "to"), ", ")
# A tibble: 3 x 3
From to frequency
<chr> <chr> <int>
1 1 1 2 3 3
2 1 2 2 1 5 5 2 3 4 5 6 7 2
3 1 2 2 1 2 3 4 5 1