在 R 中使用 case when 重新编码变量子集
Recode subset of variables using case when in R
我正在尝试用 R 重新编码一些调查数据。这是一些与我实际拥有的数据相似的数据。
df <- data.frame(
A = rep("Y",5),
B=seq(as.POSIXct("2014-01-13"), as.POSIXct("2014-01-17"), by="days"),
C = c("Neither agree nor disagree",
"Somewhat agree",
"Somewhat disagree",
"Strongly agree",
"Strongly disagree"),
D=c("Neither agree nor disagree",
"Somewhat agree",
"Somewhat disagree",
"Strongly agree",
"Strongly disagree")
)
我查阅了一些其他帖子并编写了以下代码:
init2<-df %>%
mutate_at(vars(c(1:4)), function(x) case_when( x == "Neither agree nor disagree" ~ 3,
x == "Somewhat agree" ~ 4,
x == "Somewhat disagree"~ 2,
x== "Strongly agree"~ 5,
x== "Strongly disaagree"~ 1
))
但这会引发错误
Error: Problem with `mutate()` column `B`.
i `B = (function (x) ...`.
x character string is not in a standard unambiguous format
Run `rlang::last_error()` to see where the error occurred.
我输入的日期是 POSIXct。我应该改变他们的格式吗?此问题的修复方法是什么?谢谢
尝试将 POSIXt
列重新编码为您的李克特量表没有意义;尝试重新编码 "Y"
列对我来说也没有意义,但至少你没有收到错误。
我建议你:
明确mutate
你想要的列,
df %>%
mutate(across(c(C, D), ~ case_when(
. == "Neither agree nor disagree" ~ 3,
. == "Somewhat agree" ~ 4,
. == "Somewhat disagree" ~ 2,
. == "Strongly agree" ~ 5,
. == "Strongly disagree" ~ 1
)))
# A B C D
# 1 Y 2014-01-13 3 3
# 2 Y 2014-01-14 4 4
# 3 Y 2014-01-15 2 2
# 4 Y 2014-01-16 5 5
# 5 Y 2014-01-17 1 1
显式排除 你不想要的列,
df %>%
mutate(across(-c(A, B), ~ case_when(
. == "Neither agree nor disagree" ~ 3,
. == "Somewhat agree" ~ 4,
. == "Somewhat disagree" ~ 2,
. == "Strongly agree" ~ 5,
. == "Strongly disagree" ~ 1
)))
通过一些过滤器有条件地处理它们(虽然这不是万无一失的):
df %>%
mutate(across(where(~ all(grepl("agree", .))), ~ case_when(
. == "Neither agree nor disagree" ~ 3,
. == "Somewhat agree" ~ 4,
. == "Somewhat disagree" ~ 2,
. == "Strongly agree" ~ 5,
. == "Strongly disagree" ~ 1
)))
仅供参考,根据 https://dplyr.tidyverse.org/reference/mutate_all.html(2021 年 11 月 7 日):
Scoped verbs (_if
, _at
, _all
) have been superseded by the use of across()
in an existing verb. See vignette("colwise")
for details.
它与 where
完美搭配,由 tidyselect
软件包提供(秘密地)。
我正在尝试用 R 重新编码一些调查数据。这是一些与我实际拥有的数据相似的数据。
df <- data.frame(
A = rep("Y",5),
B=seq(as.POSIXct("2014-01-13"), as.POSIXct("2014-01-17"), by="days"),
C = c("Neither agree nor disagree",
"Somewhat agree",
"Somewhat disagree",
"Strongly agree",
"Strongly disagree"),
D=c("Neither agree nor disagree",
"Somewhat agree",
"Somewhat disagree",
"Strongly agree",
"Strongly disagree")
)
我查阅了一些其他帖子并编写了以下代码:
init2<-df %>%
mutate_at(vars(c(1:4)), function(x) case_when( x == "Neither agree nor disagree" ~ 3,
x == "Somewhat agree" ~ 4,
x == "Somewhat disagree"~ 2,
x== "Strongly agree"~ 5,
x== "Strongly disaagree"~ 1
))
但这会引发错误
Error: Problem with `mutate()` column `B`.
i `B = (function (x) ...`.
x character string is not in a standard unambiguous format
Run `rlang::last_error()` to see where the error occurred.
我输入的日期是 POSIXct。我应该改变他们的格式吗?此问题的修复方法是什么?谢谢
尝试将 POSIXt
列重新编码为您的李克特量表没有意义;尝试重新编码 "Y"
列对我来说也没有意义,但至少你没有收到错误。
我建议你:
明确
mutate
你想要的列,df %>% mutate(across(c(C, D), ~ case_when( . == "Neither agree nor disagree" ~ 3, . == "Somewhat agree" ~ 4, . == "Somewhat disagree" ~ 2, . == "Strongly agree" ~ 5, . == "Strongly disagree" ~ 1 ))) # A B C D # 1 Y 2014-01-13 3 3 # 2 Y 2014-01-14 4 4 # 3 Y 2014-01-15 2 2 # 4 Y 2014-01-16 5 5 # 5 Y 2014-01-17 1 1
显式排除 你不想要的列,
df %>% mutate(across(-c(A, B), ~ case_when( . == "Neither agree nor disagree" ~ 3, . == "Somewhat agree" ~ 4, . == "Somewhat disagree" ~ 2, . == "Strongly agree" ~ 5, . == "Strongly disagree" ~ 1 )))
通过一些过滤器有条件地处理它们(虽然这不是万无一失的):
df %>% mutate(across(where(~ all(grepl("agree", .))), ~ case_when( . == "Neither agree nor disagree" ~ 3, . == "Somewhat agree" ~ 4, . == "Somewhat disagree" ~ 2, . == "Strongly agree" ~ 5, . == "Strongly disagree" ~ 1 )))
仅供参考,根据 https://dplyr.tidyverse.org/reference/mutate_all.html(2021 年 11 月 7 日):
Scoped verbs (
_if
,_at
,_all
) have been superseded by the use ofacross()
in an existing verb. Seevignette("colwise")
for details.
它与 where
完美搭配,由 tidyselect
软件包提供(秘密地)。