如何组合 summarize_at 和需要从 R 中的多列输入的自定义函数？

Question

我有一份员工实际能力（每个月都会变化）和他们的计划能力（每个月都不变）的列表。我想使用 summarize_at 来判断他们超出（或低于）分配的百分比。但是，我无法弄清楚如何通过我的 summarize 调用来传递我的自定义函数。我试着查看这个，但我的功能不同，因为它需要来自多个列的输入。

这是一个示例数据集：

library(dplyr)
question <- tibble(name = c("justin", "justin", "corey", "corey"),
                   allocation_1 = c(1, 2, 4, 8),
                   allocation_2 = c(2, 4, 11, 9),
                   scheduled_allocation = c(3, 3, 4, 4))

这是我想要的：

library(dplyr)
answer <- tibble(name = c("justin", "corey"),
                 allocation_1 = c(100, 300),
                 allocation_2 = c(200, 500))

这就是我目前所知道的。我知道自定义函数有效——我只是无法让它通过管道。 X 将对应于他们的总分配（例如，对于分配 1 的贾斯汀，1+2 = 3），Y 是他们的计划分配（例如，3——而不是 6）。因此，3/3 = 1 *100 = 100% 已分配。

#custom function that works

get_cap_percent <- function (x, y) {
  100*(x/y)
}

#Code that doesn't work
question %>%
  dplyr::group_by(name) %>%
  summarise_at(vars(contains("allocation_")), sum, na.rm = TRUE) %>%
  summarise_at(vars(contains("allocation_")), get_cap_percent, x = ., y = scheduled_allocation)

Answer 1

我们可以将其包装在一个 summarise 中，因为在汇总步骤之后，除了这些列和分组之外不会有任何其他列

library(dplyr)
question %>% 
    group_by(name) %>%
    summarise(across(contains('allocation_'), ~
     get_cap_percent(sum(., na.rm = TRUE), first(scheduled_allocation))))

-输出

# A tibble: 2 x 3
  name   allocation_1 allocation_2
  <chr>         <dbl>        <dbl>
1 corey           300          500
2 justin          100          200

如何组合 summarize_at 和需要从 R 中的多列输入的自定义函数？

How to combine summarize_at and custom function that requires input from multiple columns in R?

group-by

r

group-summaries

dplyr

summarize