在 R 中使用 mutate() 和 across() 创建多个新列

Creating multiple new columns using mutate() and across() in R

我想对按 ID 分组的多个列同时执行以下计算:

df <- df %>%
  group_by(Id) %>%
  mutate("Flows.2018.04"= Assets.2018.04 - 
           (Assets.2018.03 * Returns.2018.04))

数据集包含 2018.04 到 2022.02 的每个月的 Assets.YYYY.MM 和 Returns.YYYY.MM 列,我想为每个月创建一个 Flows 列。

我知道我可以对每一列都这样做:

df <- df %>%
  group_by(Id) %>%
  mutate("Flows.2018.04"= Assets.2018.04 - 
           (Assets.2018.03 * Returns.2018.04)) %>%
  mutate("Flows.2018.05"= Assets.2018.05 - 
           (Assets.2018.04 * Returns.2018.05))

但是因为我想对 50 多列进行此计算,所以我希望有一种更优雅的方法。据我所知,使用 dplyr across() 函数应该是可能的,但我无法弄清楚如何做到这一点。

我希望将新列命名为 Flows.YYYY.MM,这会使问题进一步复杂化。我认为实现此目的的最简单方法可能是在创建列后简单地重命名它们。

我也考虑过将数据帧从宽格式转换为长格式来执行此计算,但这对我来说似乎更复杂。

对实现预期结果有什么建议吗?

请根据要求在下面找到示例数据:

library(tidyverse)
df <- data.frame(
  ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
  Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
  Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
  Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
  Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
  Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))

df
    ID Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05
1 6F55           5000           2345           3459            1.03            0.94
2 6F55           3000           1926           6933            0.77            1.11
3 ANE3           5870           8563           1533            1.01            0.89
4 ANE3           4098           9373           4556            0.97            1.02
5 6F55           9878           7432           9855            1.06            1.02

期望的结果是:

  ID    Assets.2018.03 Assets.2018.04 Assets.2018.05 Returns.2018.04 Returns.2018.05 Flows.2018.04 Flows.2018.05
1 6F55            5000           2345           3459            1.03            0.94        -2805          1255
2 6F55            3000           1926           6933            0.77            1.11         -384          4795
3 ANE3            5870           8563           1533            1.01            0.89         2634         -6088
4 ANE3            4098           9373           4556            0.97            1.02         5398         -5004
5 6F55            9878           7432           9855            1.06            1.02        -3039          2274

这个怎么样:

  library(tidyverse)
df <- data.frame(
  ID = c("6F55", "6F55", "ANE3", "ANE3", "6F55"),
  Assets.2018.03 = c(5000, 3000, 5870, 4098 ,9878),
  Assets.2018.04 = c(2345, 1926, 8563, 9373, 7432),
  Assets.2018.05 = c(3459, 6933, 1533, 4556, 9855),
  Returns.2018.04 = c(1.03, 0.77, 1.01, 0.97, 1.06),
  Returns.2018.05 = c(0.94, 1.11, 0.89, 1.02, 1.02))


df %>% 
  pivot_longer(-ID, 
               names_to = c(".value", "date"), 
               names_pattern= "(.*)\.(\d{4}\.\d{2})") %>% 
  arrange(ID, date) %>% 
  group_by(ID, date) %>% 
  mutate(obs = seq_along(date)) %>% 
  group_by(ID, obs) %>% 
  mutate(Flow = Assets - (lag(Assets)*Returns)) %>% 
  pivot_wider(names_from = "date", 
              values_from = c("Assets", "Returns", "Flow")) %>% 
  as.data.frame()
#>     ID obs Assets_2018.03 Assets_2018.04 Assets_2018.05 Returns_2018.03
#> 1 6F55   1           5000           2345           3459              NA
#> 2 6F55   2           3000           1926           6933              NA
#> 3 6F55   3           9878           7432           9855              NA
#> 4 ANE3   1           5870           8563           1533              NA
#> 5 ANE3   2           4098           9373           4556              NA
#>   Returns_2018.04 Returns_2018.05 Flow_2018.03 Flow_2018.04 Flow_2018.05
#> 1            1.03            0.94           NA     -2805.00      1254.70
#> 2            0.77            1.11           NA      -384.00      4795.14
#> 3            1.06            1.02           NA     -3038.68      2274.36
#> 4            1.01            0.89           NA      2634.30     -6088.07
#> 5            0.97            1.02           NA      5397.94     -5004.46

reprex package (v2.0.1)

于 2022-04-10 创建