如何跨具有多个列类型的整个数据集使用和变异?
How to use across and mutate across an entire dataset that has multiple column types?
我正在尝试在我的整个数据集中使用 dplyr 和 case_when,因此每当它看到“强烈同意”时,它会将其更改为数字 5,“同意”更改为数字 4,依此类推在。我试过查看此 ,但出现错误,因为我的数据集包含逻辑列和数字列,并且 R 正确地说“同意”不能在逻辑列中,等等
这是我的数据:
library(dplyr)
test <- tibble(name = c("Justin", "Corey", "Sibley"),
date = c("2021-08-09", "2021-10-29", "2021-01-01"),
s1 = c("Agree", "Neutral", "Strongly Disagree"),
s2rl = c("Agree", "Neutral", "Strongly Disagree"),
f1 = c("Strongly Agree", "Disagree", "Strongly Disagree"),
f2rl = c("Strongly Agree", "Disagree", "Strongly Disagree"),
exam = c(90, 99, 100),
early = c(TRUE, FALSE, FALSE))
理想情况下,我想要一个可以让我遍历整个数据集的命令。但是,如果无法做到这一点,我希望有一个参数允许我使用多个 across(contains()) 参数(即,这里包含“s”或“f”)。
这是我已经尝试过但无济于事的方法:
library(dplyr)
test %>%
mutate(across(.),
~ case_when(. == "Strongly Agree" ~ 5,
. == "Agree" ~ 4,
. == "Neutral" ~ 3,
. == "Disagree" ~ 2,
. == "Strongly Disagree" ~ 1,
TRUE ~ NA))
Error: Problem with `mutate()` input `..1`.
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type `tbl_df<
name: character
date: character
s1 : character
s2rl: character
f1 : character
f2rl: character
exam: double
>`.
ℹ It must be numeric or character.
ℹ Input `..1` is `across(.)`.
我们可以使用matches
来传递正则表达式
library(dplyr)
test %>%
mutate(across(matches('^(s|f)'), ~ case_when(. == "Strongly Agree" ~ 5,
. == "Agree" ~ 4,
. == "Neutral" ~ 3,
. == "Disagree" ~ 2,
. == "Strongly Disagree" ~ 1,
TRUE ~ NA_real_)))
-输出
# A tibble: 3 x 8
name date s1 s2rl f1 f2rl exam early
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 Justin 2021-08-09 4 4 5 5 90 TRUE
2 Corey 2021-10-29 3 3 2 2 99 FALSE
3 Sibley 2021-01-01 1 1 1 1 100 FALSE
根据?across
across() makes it easy to apply the same transformation to multiple columns, allowing you to use select() semantics inside in "data-masking" functions like summarise() and mutate().
如果我们检查 ?select
,它 returns 与用于 selecting 列的各种 select-helpers
也可以在 across
中使用
Tidyverse selections implement a dialect of R where operators make it easy to select variables:
:用于 select 一系列连续变量。
!用于取一组变量的补集。
& 和 |用于 select 两组变量的交集或并集。
c() 用于组合 select 离子。
In addition, you can use selection helpers. Some helpers select specific columns:
everything(): Matches all variables.
last_col(): Select last variable, possibly with an offset.
这些助手 select 变量通过匹配名称中的模式:
starts_with(): Starts with a prefix.
ends_with(): Ends with a suffix.
contains(): Contains a literal string.
matches(): Matches a regular expression.
num_range(): Matches a numerical range like x01, x02, x03.
这些助手 select 来自字符向量的变量:
all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.
any_of(): Same as all_of(), except that no error is thrown for names that don't exist.
此助手 selects 变量具有以下功能:
where(): Applies a function to all variables and selects those for which the function returns TRUE.
我们也可以换个方式来做。
首先只使用字符 5
作为 "5"
等等......
在这种情况下,我们必须使用 NA_character_
,它是字符类型的 NA
最后使用 type.convert(as.is = TRUE)
得到整数:
library(dplyr)
test %>%
mutate(across(s1:f2rl,
~ case_when(. == "Strongly Agree" ~ "5",
. == "Agree" ~ "4",
. == "Neutral" ~ "3",
. == "Disagree" ~ "2",
. == "Strongly Disagree" ~ "1",
TRUE ~ NA_character_ ))) %>%
type.convert(as.is = TRUE)
# A tibble: 3 x 8
name date s1 s2rl f1 f2rl exam early
<chr> <chr> <int> <int> <int> <int> <int> <lgl>
1 Justin 2021-08-09 4 4 5 5 90 TRUE
2 Corey 2021-10-29 3 3 2 2 99 FALSE
3 Sibley 2021-01-01 1 1 1 1 100 FALSE
我正在尝试在我的整个数据集中使用 dplyr 和 case_when,因此每当它看到“强烈同意”时,它会将其更改为数字 5,“同意”更改为数字 4,依此类推在。我试过查看此
这是我的数据:
library(dplyr)
test <- tibble(name = c("Justin", "Corey", "Sibley"),
date = c("2021-08-09", "2021-10-29", "2021-01-01"),
s1 = c("Agree", "Neutral", "Strongly Disagree"),
s2rl = c("Agree", "Neutral", "Strongly Disagree"),
f1 = c("Strongly Agree", "Disagree", "Strongly Disagree"),
f2rl = c("Strongly Agree", "Disagree", "Strongly Disagree"),
exam = c(90, 99, 100),
early = c(TRUE, FALSE, FALSE))
理想情况下,我想要一个可以让我遍历整个数据集的命令。但是,如果无法做到这一点,我希望有一个参数允许我使用多个 across(contains()) 参数(即,这里包含“s”或“f”)。
这是我已经尝试过但无济于事的方法:
library(dplyr)
test %>%
mutate(across(.),
~ case_when(. == "Strongly Agree" ~ 5,
. == "Agree" ~ 4,
. == "Neutral" ~ 3,
. == "Disagree" ~ 2,
. == "Strongly Disagree" ~ 1,
TRUE ~ NA))
Error: Problem with `mutate()` input `..1`.
x Must subset columns with a valid subscript vector.
x Subscript has the wrong type `tbl_df<
name: character
date: character
s1 : character
s2rl: character
f1 : character
f2rl: character
exam: double
>`.
ℹ It must be numeric or character.
ℹ Input `..1` is `across(.)`.
我们可以使用matches
来传递正则表达式
library(dplyr)
test %>%
mutate(across(matches('^(s|f)'), ~ case_when(. == "Strongly Agree" ~ 5,
. == "Agree" ~ 4,
. == "Neutral" ~ 3,
. == "Disagree" ~ 2,
. == "Strongly Disagree" ~ 1,
TRUE ~ NA_real_)))
-输出
# A tibble: 3 x 8
name date s1 s2rl f1 f2rl exam early
<chr> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <lgl>
1 Justin 2021-08-09 4 4 5 5 90 TRUE
2 Corey 2021-10-29 3 3 2 2 99 FALSE
3 Sibley 2021-01-01 1 1 1 1 100 FALSE
根据?across
across() makes it easy to apply the same transformation to multiple columns, allowing you to use select() semantics inside in "data-masking" functions like summarise() and mutate().
如果我们检查 ?select
,它 returns 与用于 selecting 列的各种 select-helpers
也可以在 across
中使用
Tidyverse selections implement a dialect of R where operators make it easy to select variables:
:用于 select 一系列连续变量。
!用于取一组变量的补集。
& 和 |用于 select 两组变量的交集或并集。
c() 用于组合 select 离子。
In addition, you can use selection helpers. Some helpers select specific columns:
everything(): Matches all variables.
last_col(): Select last variable, possibly with an offset.
这些助手 select 变量通过匹配名称中的模式:
starts_with(): Starts with a prefix.
ends_with(): Ends with a suffix.
contains(): Contains a literal string.
matches(): Matches a regular expression.
num_range(): Matches a numerical range like x01, x02, x03.
这些助手 select 来自字符向量的变量:
all_of(): Matches variable names in a character vector. All names must be present, otherwise an out-of-bounds error is thrown.
any_of(): Same as all_of(), except that no error is thrown for names that don't exist.
此助手 selects 变量具有以下功能:
where(): Applies a function to all variables and selects those for which the function returns TRUE.
我们也可以换个方式来做。
首先只使用字符 5
作为 "5"
等等......
在这种情况下,我们必须使用 NA_character_
,它是字符类型的 NA
最后使用 type.convert(as.is = TRUE)
得到整数:
library(dplyr)
test %>%
mutate(across(s1:f2rl,
~ case_when(. == "Strongly Agree" ~ "5",
. == "Agree" ~ "4",
. == "Neutral" ~ "3",
. == "Disagree" ~ "2",
. == "Strongly Disagree" ~ "1",
TRUE ~ NA_character_ ))) %>%
type.convert(as.is = TRUE)
# A tibble: 3 x 8
name date s1 s2rl f1 f2rl exam early
<chr> <chr> <int> <int> <int> <int> <int> <lgl>
1 Justin 2021-08-09 4 4 5 5 90 TRUE
2 Corey 2021-10-29 3 3 2 2 99 FALSE
3 Sibley 2021-01-01 1 1 1 1 100 FALSE