聚合和折叠基于向量的同时保持顺序
Aggregate and collapse a vector based while maintaing order
我的数据框如下:
+------+-----+----------+
| from | to | priority |
+------+-----+----------+
| 1 | 8 | 1 |
| 2 | 6 | 1 |
| 3 | 4 | 1 |
| 4 | 5 | 3 |
| 5 | 6 | 4 |
| 6 | 2 | 5 |
| 7 | 8 | 2 |
| 4 | 3 | 5 |
| 2 | 1 | 1 |
| 6 | 6 | 4 |
| 1 | 7 | 5 |
| 8 | 4 | 6 |
| 9 | 5 | 3 |
+------+-----+----------+
我的目标是根据 from 列对 "to" 列进行分组,但是如果变量已经存在于任一列中,我不想考虑它们进一步
此外,总优先级将是所有组优先级的总和
因此生成的数据框如下:
+------+------+----------------+
| from | to | Total Priority |
+------+------+----------------+
| 1 | 8, 7 | 6 |
| 2 | 6 | 1 |
| 3 | 4 | 1 |
| 9 | 5 | 3 |
+------+------+----------------+
此外,我想在分组时保持与原始 table 相同的顺序
我能够使用 "splitstackshape" 包折叠 from 列,如下所示
library(splitstackshape)
cSplit(df, 'to', sep = ','
+ , direction = 'long')[, .(to = toString(unique(to)))
+ , by = from]
这确实引入了重复值
我想知道是否有办法使用任何其他包获得所需的结果
目前还不清楚您是如何尝试创建群组的,但这至少会让您处于正确的范围内:
library(tidyverse)
df <- tribble(~from, ~to, ~priority,
1,8,1,
2,6,1,
3,4,1,
4,5,3,
5,6,4,
6,2,5,
7,8,2,
4,3,5,
2,1,1,
6,6,4,
1,7,5,
8,4,6,
9,5,3)
df %>%
group_by(from) %>%
summarise(to = toString(to),
`Total Priority` = sum(priority, na.rm=T))
您的结果将是:
# A tibble: 9 x 3
from to `Total Priority`
<dbl> <chr> <dbl>
1 1 8, 7 6
2 2 6, 1 2
3 3 4 1
4 4 5, 3 8
5 5 6 4
6 6 2, 6 9
7 7 8 2
8 8 4 6
9 9 5 3
使用注释末尾可重复显示的 DF
,按 from
排序,给出 DF2
,然后遍历其行,删除任何具有重复的行。我们在这里需要一个循环,因为每次移除都取决于之前的移除。最后总结一下结果。
library(dplyr)
DF2 <- arrange(DF, from)
i <- 1
while(i <= nrow(DF2)) {
ix <- seq_len(i-1)
dup <- with(DF2, (to[i] %in% c(to[ix], from[ix])) | (from[i] %in% to[ix]))
if (dup) DF2 <- DF2[-i, ] else i <- i + 1
}
DF2 %>%
group_by(from) %>%
summarize(to = toString(to), priority = sum(priority)) %>%
ungroup
给予:
# A tibble: 4 x 3
from to priority
<int> <chr> <int>
1 1 8, 7 6
2 2 6 1
3 3 4 1
4 9 5 3
备注
Lines <- "from | to | priority
1 | 8 | 1
2 | 6 | 1
3 | 4 | 1
4 | 5 | 3
5 | 6 | 4
6 | 2 | 5
7 | 8 | 2
4 | 3 | 5
2 | 1 | 1
6 | 6 | 4
1 | 7 | 5
8 | 4 | 6
9 | 5 | 3"
DF <- read.table(text = Lines, header = TRUE, sep = "|", strip.white = TRUE)
我的数据框如下:
+------+-----+----------+
| from | to | priority |
+------+-----+----------+
| 1 | 8 | 1 |
| 2 | 6 | 1 |
| 3 | 4 | 1 |
| 4 | 5 | 3 |
| 5 | 6 | 4 |
| 6 | 2 | 5 |
| 7 | 8 | 2 |
| 4 | 3 | 5 |
| 2 | 1 | 1 |
| 6 | 6 | 4 |
| 1 | 7 | 5 |
| 8 | 4 | 6 |
| 9 | 5 | 3 |
+------+-----+----------+
我的目标是根据 from 列对 "to" 列进行分组,但是如果变量已经存在于任一列中,我不想考虑它们进一步 此外,总优先级将是所有组优先级的总和
因此生成的数据框如下:
+------+------+----------------+
| from | to | Total Priority |
+------+------+----------------+
| 1 | 8, 7 | 6 |
| 2 | 6 | 1 |
| 3 | 4 | 1 |
| 9 | 5 | 3 |
+------+------+----------------+
此外,我想在分组时保持与原始 table 相同的顺序
我能够使用 "splitstackshape" 包折叠 from 列,如下所示
library(splitstackshape)
cSplit(df, 'to', sep = ','
+ , direction = 'long')[, .(to = toString(unique(to)))
+ , by = from]
这确实引入了重复值 我想知道是否有办法使用任何其他包获得所需的结果
目前还不清楚您是如何尝试创建群组的,但这至少会让您处于正确的范围内:
library(tidyverse)
df <- tribble(~from, ~to, ~priority,
1,8,1,
2,6,1,
3,4,1,
4,5,3,
5,6,4,
6,2,5,
7,8,2,
4,3,5,
2,1,1,
6,6,4,
1,7,5,
8,4,6,
9,5,3)
df %>%
group_by(from) %>%
summarise(to = toString(to),
`Total Priority` = sum(priority, na.rm=T))
您的结果将是:
# A tibble: 9 x 3
from to `Total Priority`
<dbl> <chr> <dbl>
1 1 8, 7 6
2 2 6, 1 2
3 3 4 1
4 4 5, 3 8
5 5 6 4
6 6 2, 6 9
7 7 8 2
8 8 4 6
9 9 5 3
使用注释末尾可重复显示的 DF
,按 from
排序,给出 DF2
,然后遍历其行,删除任何具有重复的行。我们在这里需要一个循环,因为每次移除都取决于之前的移除。最后总结一下结果。
library(dplyr)
DF2 <- arrange(DF, from)
i <- 1
while(i <= nrow(DF2)) {
ix <- seq_len(i-1)
dup <- with(DF2, (to[i] %in% c(to[ix], from[ix])) | (from[i] %in% to[ix]))
if (dup) DF2 <- DF2[-i, ] else i <- i + 1
}
DF2 %>%
group_by(from) %>%
summarize(to = toString(to), priority = sum(priority)) %>%
ungroup
给予:
# A tibble: 4 x 3
from to priority
<int> <chr> <int>
1 1 8, 7 6
2 2 6 1
3 3 4 1
4 9 5 3
备注
Lines <- "from | to | priority
1 | 8 | 1
2 | 6 | 1
3 | 4 | 1
4 | 5 | 3
5 | 6 | 4
6 | 2 | 5
7 | 8 | 2
4 | 3 | 5
2 | 1 | 1
6 | 6 | 4
1 | 7 | 5
8 | 4 | 6
9 | 5 | 3"
DF <- read.table(text = Lines, header = TRUE, sep = "|", strip.white = TRUE)