从 R 中的 tidygraph 对象列表中删除重复元素?
Remove duplicated elements from a list of tidygraph objects in R?
我有一个 tidygraph 对象列表。在节点数据中,我有两列,即 name
和 frequency
。我想要做的是删除任何重复多次的列表元素(即 tidygraph 对象)。希望我的例子可以解释更多:
首先,我创建了一些 node/edge 数据,将它们转换为 tidygraph 对象并将它们放入列表中:
library(tidygraph)
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
# create some node and edge data for the tbl_graph
nodes <- data.frame(name = c("x4", NA, NA),
val = c(1, 5, 2))
nodes2 <- data.frame(name = c("x4", NA, NA),
val = c(3, 2, 2))
nodes3 <- data.frame(name = c("x4", NA, NA),
val = c(5, 6, 7))
nodes4 <- data.frame(name = c("x4", "x2", NA, NA, "x1", NA, NA),
val = c(3, 2, 2, 1, 1, 2, 7))
nodes5 <- data.frame(name= c("x1", "x2", NA),
val = c(7, 4, 2))
nodes6 <- data.frame(name = c("x1", "x2", NA),
val = c(2, 1, 3))
edges <- data.frame(from = c(1,1), to = c(2,3))
edges1 <- data.frame(from = c(1, 2, 2, 1, 5, 5),
to = c(2, 3, 4, 5, 6, 7))
# create the tbl_graphs
tg <- tbl_graph(nodes = nodes, edges = edges)
tg_1 <- tbl_graph(nodes = nodes2, edges = edges)
tg_2 <- tbl_graph(nodes = nodes2, edges = edges)
tg_3 <- tbl_graph(nodes = nodes4, edges = edges1)
tg_4 <- tbl_graph(nodes = nodes5, edges = edges)
tg_5 <- tbl_graph(nodes = nodes6, edges = edges)
# put into list
myList <- list(tg, tg_1, tg_2, tg_3, tg_4, tg_5)
然后,我有这个小函数可以根据 name
列告诉我每个列表元素的频率。即如果name
列在多个列表元素中为repeated/identical,则增加频率。因此,在我上面的示例中,tg
中的 name
列在我的列表中出现了 3 次(在 tg
、tg_1
和 tg_2
中相同)。 .所以它的频率为 3.
然后我向每个列表元素添加一个 frequency
列并更改我原来的 myList
对象。例如:
freqs <- lapply(myList, function(x){
x %>%
pull(name) %>%
replace_na("..") %>%
paste0(collapse = "")
}) %>%
unlist(use.names = F) %>%
as_tibble() %>%
group_by(value) %>%
mutate(val = n():1) %>%
pull(val)
newList <- purrr::imap(myList, ~.x %>%
mutate(frequency = freqs[.y]) %>%
select(name, frequency))
正在查看 newList
returns:
> newList
[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x4 3
2 NA 3
3 NA 3
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
[[2]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x4 2
2 NA 2
3 NA 2
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
[[3]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x4 1
2 NA 1
3 NA 1
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
[[4]]
# A tbl_graph: 7 nodes and 6 edges
#
# A rooted tree
#
# Node Data: 7 × 2 (active)
name frequency
<chr> <int>
1 x4 1
2 x2 1
3 NA 1
4 NA 1
5 x1 1
6 NA 1
# … with 1 more row
#
# Edge Data: 6 × 2
from to
<int> <int>
1 1 2
2 2 3
3 2 4
# … with 3 more rows
[[5]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x1 2
2 x2 2
3 NA 2
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
[[6]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x1 1
2 x2 1
3 NA 1
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
所以我们可以看到带有 x4, NA, NA
的 name
列出现了 3 次......但不是将频率添加到每个......我似乎在倒数频率(不是故意的)...所以,x4, NA, NA
说它的频率是 3,然后是 2,然后是 1。
我正在尝试删除任何重复的列表元素,只保留频率最高的元素。例如,我想要的输出如下所示:
> newList
[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x4 3
2 NA 3
3 NA 3
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
[[2]]
# A tbl_graph: 7 nodes and 6 edges
#
# A rooted tree
#
# Node Data: 7 × 2 (active)
name frequency
<chr> <int>
1 x4 1
2 x2 1
3 NA 1
4 NA 1
5 x1 1
6 NA 1
# … with 1 more row
#
# Edge Data: 6 × 2
from to
<int> <int>
1 1 2
2 2 3
3 2 4
# … with 3 more rows
[[3]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x1 2
2 x2 2
3 NA 2
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
在这里我们可以看到具有重复频率的元素已被删除...关于如何执行此操作有什么建议吗?
对 的评论足以改变答案。也就是说,通过 slice
-ing 分组的第一个 tibble 稍微更新代码,可能像这样:
library(tidygraph) ; library(tidyverse)
freqs <- map(myList, function(x){
x %>%
pull(name) %>%
replace_na("..") %>%
paste0(collapse = "")
}) %>%
unlist(use.names = F) %>%
as_tibble() %>%
mutate(ids = 1:n()) %>%
group_by(value) %>%
mutate(val = n():1)
ids <- freqs %>% slice(1) %>% pull(ids)
freqs <- freqs %>% pull(val)
newList <- purrr::imap(myList, ~.x %>%
mutate(frequency = freqs[.y]) %>%
select(name, frequency))
newList[sort(ids)]
[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 x 2 (active)
name frequency
<chr> <int>
1 x4 3
2 NA 3
3 NA 3
#
# Edge Data: 2 x 2
from to
<int> <int>
1 1 2
2 1 3
[[2]]
# A tbl_graph: 7 nodes and 6 edges
#
# A rooted tree
#
# Node Data: 7 x 2 (active)
name frequency
<chr> <int>
1 x4 1
2 x2 1
3 NA 1
4 NA 1
5 x1 1
6 NA 1
# ... with 1 more row
#
# Edge Data: 6 x 2
from to
<int> <int>
1 1 2
2 2 3
3 2 4
# ... with 3 more rows
[[3]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 x 2 (active)
name frequency
<chr> <int>
1 x1 2
2 x2 2
3 NA 2
#
# Edge Data: 2 x 2
from to
<int> <int>
1 1 2
2 1 3
我有一个 tidygraph 对象列表。在节点数据中,我有两列,即 name
和 frequency
。我想要做的是删除任何重复多次的列表元素(即 tidygraph 对象)。希望我的例子可以解释更多:
首先,我创建了一些 node/edge 数据,将它们转换为 tidygraph 对象并将它们放入列表中:
library(tidygraph)
library(dplyr)
library(tidyr)
library(purrr)
library(stringr)
# create some node and edge data for the tbl_graph
nodes <- data.frame(name = c("x4", NA, NA),
val = c(1, 5, 2))
nodes2 <- data.frame(name = c("x4", NA, NA),
val = c(3, 2, 2))
nodes3 <- data.frame(name = c("x4", NA, NA),
val = c(5, 6, 7))
nodes4 <- data.frame(name = c("x4", "x2", NA, NA, "x1", NA, NA),
val = c(3, 2, 2, 1, 1, 2, 7))
nodes5 <- data.frame(name= c("x1", "x2", NA),
val = c(7, 4, 2))
nodes6 <- data.frame(name = c("x1", "x2", NA),
val = c(2, 1, 3))
edges <- data.frame(from = c(1,1), to = c(2,3))
edges1 <- data.frame(from = c(1, 2, 2, 1, 5, 5),
to = c(2, 3, 4, 5, 6, 7))
# create the tbl_graphs
tg <- tbl_graph(nodes = nodes, edges = edges)
tg_1 <- tbl_graph(nodes = nodes2, edges = edges)
tg_2 <- tbl_graph(nodes = nodes2, edges = edges)
tg_3 <- tbl_graph(nodes = nodes4, edges = edges1)
tg_4 <- tbl_graph(nodes = nodes5, edges = edges)
tg_5 <- tbl_graph(nodes = nodes6, edges = edges)
# put into list
myList <- list(tg, tg_1, tg_2, tg_3, tg_4, tg_5)
然后,我有这个小函数可以根据 name
列告诉我每个列表元素的频率。即如果name
列在多个列表元素中为repeated/identical,则增加频率。因此,在我上面的示例中,tg
中的 name
列在我的列表中出现了 3 次(在 tg
、tg_1
和 tg_2
中相同)。 .所以它的频率为 3.
然后我向每个列表元素添加一个 frequency
列并更改我原来的 myList
对象。例如:
freqs <- lapply(myList, function(x){
x %>%
pull(name) %>%
replace_na("..") %>%
paste0(collapse = "")
}) %>%
unlist(use.names = F) %>%
as_tibble() %>%
group_by(value) %>%
mutate(val = n():1) %>%
pull(val)
newList <- purrr::imap(myList, ~.x %>%
mutate(frequency = freqs[.y]) %>%
select(name, frequency))
正在查看 newList
returns:
> newList
[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x4 3
2 NA 3
3 NA 3
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
[[2]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x4 2
2 NA 2
3 NA 2
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
[[3]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x4 1
2 NA 1
3 NA 1
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
[[4]]
# A tbl_graph: 7 nodes and 6 edges
#
# A rooted tree
#
# Node Data: 7 × 2 (active)
name frequency
<chr> <int>
1 x4 1
2 x2 1
3 NA 1
4 NA 1
5 x1 1
6 NA 1
# … with 1 more row
#
# Edge Data: 6 × 2
from to
<int> <int>
1 1 2
2 2 3
3 2 4
# … with 3 more rows
[[5]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x1 2
2 x2 2
3 NA 2
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
[[6]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x1 1
2 x2 1
3 NA 1
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
所以我们可以看到带有 x4, NA, NA
的 name
列出现了 3 次......但不是将频率添加到每个......我似乎在倒数频率(不是故意的)...所以,x4, NA, NA
说它的频率是 3,然后是 2,然后是 1。
我正在尝试删除任何重复的列表元素,只保留频率最高的元素。例如,我想要的输出如下所示:
> newList
[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x4 3
2 NA 3
3 NA 3
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
[[2]]
# A tbl_graph: 7 nodes and 6 edges
#
# A rooted tree
#
# Node Data: 7 × 2 (active)
name frequency
<chr> <int>
1 x4 1
2 x2 1
3 NA 1
4 NA 1
5 x1 1
6 NA 1
# … with 1 more row
#
# Edge Data: 6 × 2
from to
<int> <int>
1 1 2
2 2 3
3 2 4
# … with 3 more rows
[[3]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 × 2 (active)
name frequency
<chr> <int>
1 x1 2
2 x2 2
3 NA 2
#
# Edge Data: 2 × 2
from to
<int> <int>
1 1 2
2 1 3
在这里我们可以看到具有重复频率的元素已被删除...关于如何执行此操作有什么建议吗?
对 slice
-ing 分组的第一个 tibble 稍微更新代码,可能像这样:
library(tidygraph) ; library(tidyverse)
freqs <- map(myList, function(x){
x %>%
pull(name) %>%
replace_na("..") %>%
paste0(collapse = "")
}) %>%
unlist(use.names = F) %>%
as_tibble() %>%
mutate(ids = 1:n()) %>%
group_by(value) %>%
mutate(val = n():1)
ids <- freqs %>% slice(1) %>% pull(ids)
freqs <- freqs %>% pull(val)
newList <- purrr::imap(myList, ~.x %>%
mutate(frequency = freqs[.y]) %>%
select(name, frequency))
newList[sort(ids)]
[[1]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 x 2 (active)
name frequency
<chr> <int>
1 x4 3
2 NA 3
3 NA 3
#
# Edge Data: 2 x 2
from to
<int> <int>
1 1 2
2 1 3
[[2]]
# A tbl_graph: 7 nodes and 6 edges
#
# A rooted tree
#
# Node Data: 7 x 2 (active)
name frequency
<chr> <int>
1 x4 1
2 x2 1
3 NA 1
4 NA 1
5 x1 1
6 NA 1
# ... with 1 more row
#
# Edge Data: 6 x 2
from to
<int> <int>
1 1 2
2 2 3
3 2 4
# ... with 3 more rows
[[3]]
# A tbl_graph: 3 nodes and 2 edges
#
# A rooted tree
#
# Node Data: 3 x 2 (active)
name frequency
<chr> <int>
1 x1 2
2 x2 2
3 NA 2
#
# Edge Data: 2 x 2
from to
<int> <int>
1 1 2
2 1 3