如何在不进行繁琐的转换的情况下跨这些列应用函数?

How can I apply a function across these columns without doing tedious transformations?

我有一个数据框(如下),我想按列汇总。

sample <- tibble(Scenario = c("Aggressive","Aggressive","Conservative","Aggressive","Likely","Aggressive","Conservative","Likely","Likely","Aggressive","Conservative","Conservative"),
           `Jan 2022` = c(5.5,15,15.77,45.2,NA,NA,NA,NA,NA,NA,NA,NA),
           `Feb 2022` = c(NA,NA,NA,NA,20.5,11.1,14.4,55.5,NA,NA,NA,NA),
           `Mar 2022` = c(NA,NA,NA,NA,NA,NA,NA,NA,88.5,9.5,18.9,25.5))

这是输出应该的样子:

# A tibble: 3 × 4
# Groups:   Scenario [3]
  Scenario     `Feb 2022` `Jan 2022` `Mar 2022`
  <chr>             <dbl>      <dbl>      <dbl>
1 Aggressive         11.1       65.7        9.5
2 Conservative       14.4       15.8       44.4
3 Likely             76          0         88.5

下面是我用来获取此输出的代码。如您所见,我使用 pivot_longer 然后应用我的 group_bysummarise 以获得所需的输出。然后我使用 pivot_wider 将其恢复为所需的宽格式。

sample %>% 
  pivot_longer(cols = c(`Jan 2022`:`Mar 2022`), names_to = "Date", values_to = "Hours") %>% 
  group_by(Scenario, Date) %>% 
  summarise(Hours = sum(Hours, na.rm = T)) %>% 
  pivot_wider(names_from = Date, values_from = Hours)

我希望找到一种更有效的方法来做到这一点,而不需要使用 pivot_longer。我在原始数据框上尝试了 运行 下面的代码,但显然,它没有按预期工作:

    sample %>%
  group_by(Scenario) %>%
  summarise(Hours = lapply(X = c(`Jan 2022`:`Mar 2022`), FUN = function(x){sum(x, na.rm = T)}))

以下是我收到的一些警告和错误:

 Error: Problem with `summarise()` column `Hours`.
ℹ `Hours = lapply(...)`.
x NA/NaN argument
ℹ The error occurred in group 1: Scenario = "Aggressive".
Run `rlang::last_error()` to see where the error occurred.
In addition: Warning messages:
1: In `Jan 2022`:`Mar 2022` :
  numerical expression has 5 elements: only the first used
2: In `Jan 2022`:`Mar 2022` :
  numerical expression has 5 elements: only the first used

我想有一种方法可以通过应用函数来实现,但我愿意接受任何建议。需要的代码行越少越好。

使用tidyverse,循环遍历列是across,而不是lapply

library(dplyr)
sample %>%
   group_by(Scenario) %>%
   summarise(across(where(is.numeric), sum, na.rm = TRUE), .groups = 'drop')

-输出

# A tibble: 3 × 4
  Scenario     `Jan 2022` `Feb 2022` `Mar 2022`
  <chr>             <dbl>      <dbl>      <dbl>
1 Aggressive         65.7       11.1        9.5
2 Conservative       15.8       14.4       44.4
3 Likely              0         76         88.5

其他解决方案选项

data.table

library(data.table)

setDT(df)[, lapply(.SD, sum, na.rm = TRUE), by = Scenario, .SDcols = is.numeric]

       Scenario Jan 2022 Feb 2022 Mar 2022
1:   Aggressive    65.70     11.1      9.5
2: Conservative    15.77     14.4     44.4
3:       Likely     0.00     76.0     88.5

使用 data.table 你可以这样做:

data.table::setDT(sample)[, lapply(.SD, sum, na.rm=T), by=Scenario]

输出:

       Scenario Jan 2022 Feb 2022 Mar 2022
1:   Aggressive    65.70     11.1      9.5
2: Conservative    15.77     14.4     44.4
3:       Likely     0.00     76.0     88.5