如何在不进行繁琐的转换的情况下跨这些列应用函数?
How can I apply a function across these columns without doing tedious transformations?
我有一个数据框(如下),我想按列汇总。
sample <- tibble(Scenario = c("Aggressive","Aggressive","Conservative","Aggressive","Likely","Aggressive","Conservative","Likely","Likely","Aggressive","Conservative","Conservative"),
`Jan 2022` = c(5.5,15,15.77,45.2,NA,NA,NA,NA,NA,NA,NA,NA),
`Feb 2022` = c(NA,NA,NA,NA,20.5,11.1,14.4,55.5,NA,NA,NA,NA),
`Mar 2022` = c(NA,NA,NA,NA,NA,NA,NA,NA,88.5,9.5,18.9,25.5))
这是输出应该的样子:
# A tibble: 3 × 4
# Groups: Scenario [3]
Scenario `Feb 2022` `Jan 2022` `Mar 2022`
<chr> <dbl> <dbl> <dbl>
1 Aggressive 11.1 65.7 9.5
2 Conservative 14.4 15.8 44.4
3 Likely 76 0 88.5
下面是我用来获取此输出的代码。如您所见,我使用 pivot_longer
然后应用我的 group_by
和 summarise
以获得所需的输出。然后我使用 pivot_wider
将其恢复为所需的宽格式。
sample %>%
pivot_longer(cols = c(`Jan 2022`:`Mar 2022`), names_to = "Date", values_to = "Hours") %>%
group_by(Scenario, Date) %>%
summarise(Hours = sum(Hours, na.rm = T)) %>%
pivot_wider(names_from = Date, values_from = Hours)
我希望找到一种更有效的方法来做到这一点,而不需要使用 pivot_longer
。我在原始数据框上尝试了 运行 下面的代码,但显然,它没有按预期工作:
sample %>%
group_by(Scenario) %>%
summarise(Hours = lapply(X = c(`Jan 2022`:`Mar 2022`), FUN = function(x){sum(x, na.rm = T)}))
以下是我收到的一些警告和错误:
Error: Problem with `summarise()` column `Hours`.
ℹ `Hours = lapply(...)`.
x NA/NaN argument
ℹ The error occurred in group 1: Scenario = "Aggressive".
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: In `Jan 2022`:`Mar 2022` :
numerical expression has 5 elements: only the first used
2: In `Jan 2022`:`Mar 2022` :
numerical expression has 5 elements: only the first used
我想有一种方法可以通过应用函数来实现,但我愿意接受任何建议。需要的代码行越少越好。
使用tidyverse
,循环遍历列是across
,而不是lapply
library(dplyr)
sample %>%
group_by(Scenario) %>%
summarise(across(where(is.numeric), sum, na.rm = TRUE), .groups = 'drop')
-输出
# A tibble: 3 × 4
Scenario `Jan 2022` `Feb 2022` `Mar 2022`
<chr> <dbl> <dbl> <dbl>
1 Aggressive 65.7 11.1 9.5
2 Conservative 15.8 14.4 44.4
3 Likely 0 76 88.5
其他解决方案选项
data.table
library(data.table)
setDT(df)[, lapply(.SD, sum, na.rm = TRUE), by = Scenario, .SDcols = is.numeric]
Scenario Jan 2022 Feb 2022 Mar 2022
1: Aggressive 65.70 11.1 9.5
2: Conservative 15.77 14.4 44.4
3: Likely 0.00 76.0 88.5
使用 data.table 你可以这样做:
data.table::setDT(sample)[, lapply(.SD, sum, na.rm=T), by=Scenario]
输出:
Scenario Jan 2022 Feb 2022 Mar 2022
1: Aggressive 65.70 11.1 9.5
2: Conservative 15.77 14.4 44.4
3: Likely 0.00 76.0 88.5
我有一个数据框(如下),我想按列汇总。
sample <- tibble(Scenario = c("Aggressive","Aggressive","Conservative","Aggressive","Likely","Aggressive","Conservative","Likely","Likely","Aggressive","Conservative","Conservative"),
`Jan 2022` = c(5.5,15,15.77,45.2,NA,NA,NA,NA,NA,NA,NA,NA),
`Feb 2022` = c(NA,NA,NA,NA,20.5,11.1,14.4,55.5,NA,NA,NA,NA),
`Mar 2022` = c(NA,NA,NA,NA,NA,NA,NA,NA,88.5,9.5,18.9,25.5))
这是输出应该的样子:
# A tibble: 3 × 4
# Groups: Scenario [3]
Scenario `Feb 2022` `Jan 2022` `Mar 2022`
<chr> <dbl> <dbl> <dbl>
1 Aggressive 11.1 65.7 9.5
2 Conservative 14.4 15.8 44.4
3 Likely 76 0 88.5
下面是我用来获取此输出的代码。如您所见,我使用 pivot_longer
然后应用我的 group_by
和 summarise
以获得所需的输出。然后我使用 pivot_wider
将其恢复为所需的宽格式。
sample %>%
pivot_longer(cols = c(`Jan 2022`:`Mar 2022`), names_to = "Date", values_to = "Hours") %>%
group_by(Scenario, Date) %>%
summarise(Hours = sum(Hours, na.rm = T)) %>%
pivot_wider(names_from = Date, values_from = Hours)
我希望找到一种更有效的方法来做到这一点,而不需要使用 pivot_longer
。我在原始数据框上尝试了 运行 下面的代码,但显然,它没有按预期工作:
sample %>%
group_by(Scenario) %>%
summarise(Hours = lapply(X = c(`Jan 2022`:`Mar 2022`), FUN = function(x){sum(x, na.rm = T)}))
以下是我收到的一些警告和错误:
Error: Problem with `summarise()` column `Hours`.
ℹ `Hours = lapply(...)`.
x NA/NaN argument
ℹ The error occurred in group 1: Scenario = "Aggressive".
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: In `Jan 2022`:`Mar 2022` :
numerical expression has 5 elements: only the first used
2: In `Jan 2022`:`Mar 2022` :
numerical expression has 5 elements: only the first used
我想有一种方法可以通过应用函数来实现,但我愿意接受任何建议。需要的代码行越少越好。
使用tidyverse
,循环遍历列是across
,而不是lapply
library(dplyr)
sample %>%
group_by(Scenario) %>%
summarise(across(where(is.numeric), sum, na.rm = TRUE), .groups = 'drop')
-输出
# A tibble: 3 × 4
Scenario `Jan 2022` `Feb 2022` `Mar 2022`
<chr> <dbl> <dbl> <dbl>
1 Aggressive 65.7 11.1 9.5
2 Conservative 15.8 14.4 44.4
3 Likely 0 76 88.5
其他解决方案选项
data.table
library(data.table)
setDT(df)[, lapply(.SD, sum, na.rm = TRUE), by = Scenario, .SDcols = is.numeric]
Scenario Jan 2022 Feb 2022 Mar 2022
1: Aggressive 65.70 11.1 9.5
2: Conservative 15.77 14.4 44.4
3: Likely 0.00 76.0 88.5
使用 data.table 你可以这样做:
data.table::setDT(sample)[, lapply(.SD, sum, na.rm=T), by=Scenario]
输出:
Scenario Jan 2022 Feb 2022 Mar 2022
1: Aggressive 65.70 11.1 9.5
2: Conservative 15.77 14.4 44.4
3: Likely 0.00 76.0 88.5