如何跨具有多个列类型的整个数据集使用和变异？

Question

我正在尝试在我的整个数据集中使用 dplyr 和 case_when，因此每当它看到“强烈同意”时，它会将其更改为数字 5，“同意”更改为数字 4，依此类推在。我试过查看此，但出现错误，因为我的数据集包含逻辑列和数字列，并且 R 正确地说“同意”不能在逻辑列中，等等

这是我的数据：

library(dplyr)
test <- tibble(name = c("Justin", "Corey", "Sibley"),
               date = c("2021-08-09", "2021-10-29", "2021-01-01"),
               s1 = c("Agree", "Neutral", "Strongly Disagree"),
               s2rl = c("Agree", "Neutral", "Strongly Disagree"),
               f1 = c("Strongly Agree", "Disagree", "Strongly Disagree"),
               f2rl = c("Strongly Agree", "Disagree", "Strongly Disagree"),
               exam = c(90, 99, 100),
               early = c(TRUE, FALSE, FALSE))

理想情况下，我想要一个可以让我遍历整个数据集的命令。但是，如果无法做到这一点，我希望有一个参数允许我使用多个 across(contains()) 参数（即，这里包含“s”或“f”）。

这是我已经尝试过但无济于事的方法：

library(dplyr)
test %>%
  mutate(across(.), 
         ~ case_when(. == "Strongly Agree" ~ 5, 
                     . == "Agree" ~ 4,
                     . == "Neutral" ~ 3,
                     . == "Disagree" ~ 2,
                     . == "Strongly Disagree" ~ 1,
                     TRUE ~ NA))

Error: Problem with `mutate()` input `..1`.
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type `tbl_df<
  name: character
  date: character
  s1  : character
  s2rl: character
  f1  : character
  f2rl: character
  exam: double
>`.
ℹ It must be numeric or character.
ℹ Input `..1` is `across(.)`.

Answer 1

我们可以使用matches来传递正则表达式

library(dplyr)
test %>% 
    mutate(across(matches('^(s|f)'), ~ case_when(. == "Strongly Agree" ~ 5, 
                     . == "Agree" ~ 4,
                     . == "Neutral" ~ 3,
                     . == "Disagree" ~ 2,
                     . == "Strongly Disagree" ~ 1,
                     TRUE ~ NA_real_)))

-输出

# A tibble: 3 x 8
  name   date          s1  s2rl    f1  f2rl  exam early
  <chr>  <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 Justin 2021-08-09     4     4     5     5    90 TRUE 
2 Corey  2021-10-29     3     3     2     2    99 FALSE
3 Sibley 2021-01-01     1     1     1     1   100 FALSE

根据?across

across() makes it easy to apply the same transformation to multiple columns, allowing you to use select() semantics inside in "data-masking" functions like summarise() and mutate().

如果我们检查 ?select，它 returns 与用于 selecting 列的各种 select-helpers 也可以在 across 中使用

Tidyverse selections implement a dialect of R where operators make it easy to select variables:

：用于 select 一系列连续变量。

！用于取一组变量的补集。

& 和 |用于 select 两组变量的交集或并集。

c() 用于组合 select 离子。

In addition, you can use selection helpers. Some helpers select specific columns:

everything(): Matches all variables.

last_col(): Select last variable, possibly with an offset.

这些助手 select 变量通过匹配名称中的模式：

starts_with(): Starts with a prefix.

ends_with(): Ends with a suffix.

contains(): Contains a literal string.

matches(): Matches a regular expression.

num_range(): Matches a numerical range like x01, x02, x03.

这些助手 select 来自字符向量的变量：

all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.

any_of(): Same as all_of(), except that no error is thrown for names that don't exist.

此助手 selects 变量具有以下功能：

where(): Applies a function to all variables and selects those for which the function returns TRUE.

Answer 2

我们也可以换个方式来做。首先只使用字符 5 作为 "5" 等等...... 在这种情况下，我们必须使用 NA_character_ ，它是字符类型的 NA 最后使用 type.convert(as.is = TRUE) 得到整数：

library(dplyr)
test %>%
    mutate(across(s1:f2rl, 
           ~ case_when(. == "Strongly Agree" ~ "5", 
                       . == "Agree" ~ "4",
                       . == "Neutral" ~ "3",
                       . == "Disagree" ~ "2",
                       . == "Strongly Disagree" ~ "1",
                       TRUE ~ NA_character_ ))) %>% 
    type.convert(as.is = TRUE)

# A tibble: 3 x 8
  name   date          s1  s2rl    f1  f2rl  exam early
  <chr>  <chr>      <int> <int> <int> <int> <int> <lgl>
1 Justin 2021-08-09     4     4     5     5    90 TRUE 
2 Corey  2021-10-29     3     3     2     2    99 FALSE
3 Sibley 2021-01-01     1     1     1     1   100 FALSE

如何跨具有多个列类型的整个数据集使用和变异？

How to use across and mutate across an entire dataset that has multiple column types?

r

dplyr

across