return 行程数据的 R 组列
R group columns of return trips data
我有火车旅行的数据和延误或取消的火车数量,我想计算总和。
Start End Delayed Cancelled
Paris Rome 1 0
Brussels Berlin 4 6
Berlin Brussels 6 2
Rome Paris 2 1
我如何将“开始”和“结束”列分组以得出巴黎-罗马和罗马-巴黎以及布鲁塞尔-柏林和柏林-布鲁塞尔的总和以得出火车延误和取消的总行程?
按字母顺序排序并按组汇总:
df |>
group_by(route = if_else(Start < End, paste(Start, End, sep = "-"), paste(End, Start, sep = "-"))) |>
summarise(Delayed = sum(Delayed), Cancelled = sum(Cancelled))
# route Delayed Cancelled
# <chr> <int> <int>
# 1 Berlin-Brussels 10 8
# 2 Paris-Rome 3 1
可重现的数据:
df = data.frame(
Start = c("Paris", "Brussels", "Berlin", "Rome"),
End = c("Rome", "Berlin", "Brussels", "Paris"),
Delayed = c(1L, 4L, 6L, 2L),
Cancelled = c(0L, 6L, 2L, 1L)
)
附加解决方案
tidyverse
library(tidyverse)
df <- data.frame(
stringsAsFactors = FALSE,
Start = c("Paris", "Brussels", "Berlin", "Rome"),
End = c("Rome", "Berlin", "Brussels", "Paris"),
Delayed = c(1L, 4L, 6L, 2L),
Cancelled = c(0L, 6L, 2L, 1L)
)
df %>%
rowwise() %>%
mutate(route = paste0(sort(c_across(c(Start, End))), collapse = "-")) %>%
group_by(route) %>%
summarise(across(where(is.numeric), sum, na.rm = TRUE))
#> # A tibble: 2 × 3
#> route Delayed Cancelled
#> <chr> <int> <int>
#> 1 Berlin-Brussels 10 8
#> 2 Paris-Rome 3 1
由 reprex package (v2.0.1)
于 2022-04-26 创建
基础
df$route <- apply(df[c("Start", "End")], 1, function(x) paste0(sort(x), collapse = "-"))
aggregate(x = df[c("Delayed", "Cancelled")], by = list(df$route), FUN = sum, na.rm = TRUE)
#> Group.1 Delayed Cancelled
#> 1 Berlin-Brussels 10 8
#> 2 Paris-Rome 3 1
由 reprex package (v2.0.1)
于 2022-04-26 创建
data.table
df$route <- apply(df[c("Start", "End")], 1, function(x) paste0(sort(x), collapse = "-"))
library(data.table)
setDT(df)[, lapply(.SD, sum, na.rm = TRUE), by = route, .SDcols = is.numeric]
#> route Delayed Cancelled
#> 1: Paris-Rome 3 1
#> 2: Berlin-Brussels 10 8
由 reprex package (v2.0.1)
于 2022-04-26 创建
我有火车旅行的数据和延误或取消的火车数量,我想计算总和。
Start End Delayed Cancelled
Paris Rome 1 0
Brussels Berlin 4 6
Berlin Brussels 6 2
Rome Paris 2 1
我如何将“开始”和“结束”列分组以得出巴黎-罗马和罗马-巴黎以及布鲁塞尔-柏林和柏林-布鲁塞尔的总和以得出火车延误和取消的总行程?
按字母顺序排序并按组汇总:
df |>
group_by(route = if_else(Start < End, paste(Start, End, sep = "-"), paste(End, Start, sep = "-"))) |>
summarise(Delayed = sum(Delayed), Cancelled = sum(Cancelled))
# route Delayed Cancelled
# <chr> <int> <int>
# 1 Berlin-Brussels 10 8
# 2 Paris-Rome 3 1
可重现的数据:
df = data.frame(
Start = c("Paris", "Brussels", "Berlin", "Rome"),
End = c("Rome", "Berlin", "Brussels", "Paris"),
Delayed = c(1L, 4L, 6L, 2L),
Cancelled = c(0L, 6L, 2L, 1L)
)
附加解决方案
tidyverse
library(tidyverse)
df <- data.frame(
stringsAsFactors = FALSE,
Start = c("Paris", "Brussels", "Berlin", "Rome"),
End = c("Rome", "Berlin", "Brussels", "Paris"),
Delayed = c(1L, 4L, 6L, 2L),
Cancelled = c(0L, 6L, 2L, 1L)
)
df %>%
rowwise() %>%
mutate(route = paste0(sort(c_across(c(Start, End))), collapse = "-")) %>%
group_by(route) %>%
summarise(across(where(is.numeric), sum, na.rm = TRUE))
#> # A tibble: 2 × 3
#> route Delayed Cancelled
#> <chr> <int> <int>
#> 1 Berlin-Brussels 10 8
#> 2 Paris-Rome 3 1
由 reprex package (v2.0.1)
于 2022-04-26 创建基础
df$route <- apply(df[c("Start", "End")], 1, function(x) paste0(sort(x), collapse = "-"))
aggregate(x = df[c("Delayed", "Cancelled")], by = list(df$route), FUN = sum, na.rm = TRUE)
#> Group.1 Delayed Cancelled
#> 1 Berlin-Brussels 10 8
#> 2 Paris-Rome 3 1
由 reprex package (v2.0.1)
于 2022-04-26 创建data.table
df$route <- apply(df[c("Start", "End")], 1, function(x) paste0(sort(x), collapse = "-"))
library(data.table)
setDT(df)[, lapply(.SD, sum, na.rm = TRUE), by = route, .SDcols = is.numeric]
#> route Delayed Cancelled
#> 1: Paris-Rome 3 1
#> 2: Berlin-Brussels 10 8
由 reprex package (v2.0.1)
于 2022-04-26 创建