如何以编程方式将转换应用于多个变量并使用 dplyr 为 R 保留原始变量和转换后的变量

how to programatically apply a transformation to multiple variables and keep the raw and the transformed variables with dplyr for R

我有一个大型数据集,我想以编程方式对某些变量应用一些转换。为了说明这一点,假设我想将日志应用于字符向量中包含的变量。我想保留输入变量并生成一个新变量,为字符向量的每个变量添加前缀(或后缀)。由于几行代码抵得上一千个段落,我的主要目标是以较少重复的方式获得 df_aim 中的结果,例如 df_syntax.

代表

library(tidyverse)
data(mtcars)

vars_to_transf <- c("disp", "hp", "drat")

# these results 
df_aim <- mtcars %>% 
    mutate(
        ln_disp =  log(disp), 
        ln_hp   =  log(hp),
        ln_drat =  log(drat)
    )

# with something like this syntax 
df_syntax <- mtcars %>% 
    mutate(across(all_of(vars_to_transf), .fns =  log))
> head(df_aim)
                   mpg cyl disp  hp drat    wt  qsec vs am gear carb ln_disp ln_hp ln_drat
Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4   5.075 4.700   1.361
Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4   5.075 4.700   1.361
Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1   4.682 4.533   1.348
Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1   5.553 4.700   1.125
Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2   5.886 5.165   1.147
Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1   5.416 4.654   1.015
> head(df_syntax)
                   mpg cyl  disp    hp  drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 5.075 4.700 1.361 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6 5.075 4.700 1.361 2.875 17.02  0  1    4    4
Datsun 710        22.8   4 4.682 4.533 1.348 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6 5.553 4.700 1.125 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 5.886 5.165 1.147 3.440 17.02  0  0    3    2
Valiant           18.1   6 5.416 4.654 1.015 3.460 20.22  1  0    3    1

感谢您的关注,如果此问题重复,我们深表歉意。

您可以使用 list:

mtcars %>% 
    mutate(across(vars_to_transf, list(log = log)))

如果您尝试使用多个函数,使用 list.names 将起作用:

mtcars %>% 
    mutate(across(vars_to_transf, 
                  list(log = log, sqrt = sqrt), 
                  .names = "{.col}_{.fn}"))

答案就在?dplyr::across()的帮助文档中。参数 .names 处理它。

.names

A glue specification that describes how to name the output columns. This can use {.col} to stand for the selected column name, and {.fn} to stand for the name of the function being applied. The default (NULL) is equivalent to "{.col}" for the single function case and "{.col}_{.fn}" for the case where a list is used for .fns.

mtcars %>% mutate(
    across(vars_to_transf, .fns =  log, .names = "ln_{vars_to_transf}")
)