在 R 中使用 case when 重新编码变量子集

Question

我正在尝试用 R 重新编码一些调查数据。这是一些与我实际拥有的数据相似的数据。

df <- data.frame(
  A = rep("Y",5),
  B=seq(as.POSIXct("2014-01-13"), as.POSIXct("2014-01-17"), by="days"),
  C = c("Neither agree nor disagree",
        "Somewhat agree",
        "Somewhat disagree",
        "Strongly agree",
        "Strongly disagree"),
  D=c("Neither agree nor disagree",
         "Somewhat agree",
         "Somewhat disagree",
         "Strongly agree",
         "Strongly disagree")
)

我查阅了一些其他帖子并编写了以下代码：

init2<-df %>%
  mutate_at(vars(c(1:4)), function(x) case_when( x == "Neither agree nor disagree" ~ 3, 
                                     x == "Somewhat agree" ~ 4, 
                                     x == "Somewhat disagree"~ 2,
                                     x== "Strongly agree"~ 5,
                                     x== "Strongly disaagree"~ 1
                                     
                                     ))

但这会引发错误

Error: Problem with `mutate()` column `B`.
i `B = (function (x) ...`.
x character string is not in a standard unambiguous format

Run `rlang::last_error()` to see where the error occurred.

我输入的日期是 POSIXct。我应该改变他们的格式吗？此问题的修复方法是什么？谢谢

Answer 1

尝试将 POSIXt 列重新编码为您的李克特量表没有意义；尝试重新编码 "Y" 列对我来说也没有意义，但至少你没有收到错误。

我建议你：

明确mutate你想要的列，

df %>%
  mutate(across(c(C, D), ~ case_when(
    . == "Neither agree nor disagree" ~ 3,
    . == "Somewhat agree"             ~ 4,
    . == "Somewhat disagree"          ~ 2,
    . == "Strongly agree"             ~ 5,
    . == "Strongly disagree"          ~ 1
  )))
#   A          B C D
# 1 Y 2014-01-13 3 3
# 2 Y 2014-01-14 4 4
# 3 Y 2014-01-15 2 2
# 4 Y 2014-01-16 5 5
# 5 Y 2014-01-17 1 1

显式排除你不想要的列，

df %>%
  mutate(across(-c(A, B), ~ case_when(
    . == "Neither agree nor disagree" ~ 3,
    . == "Somewhat agree"             ~ 4,
    . == "Somewhat disagree"          ~ 2,
    . == "Strongly agree"             ~ 5,
    . == "Strongly disagree"          ~ 1
  )))

通过一些过滤器有条件地处理它们（虽然这不是万无一失的）：

df %>%
  mutate(across(where(~ all(grepl("agree", .))), ~ case_when(
    . == "Neither agree nor disagree" ~ 3,
    . == "Somewhat agree"             ~ 4,
    . == "Somewhat disagree"          ~ 2,
    . == "Strongly agree"             ~ 5,
    . == "Strongly disagree"          ~ 1
  )))

仅供参考，根据 https://dplyr.tidyverse.org/reference/mutate_all.html（2021 年 11 月 7 日）：

Scoped verbs (_if, _at, _all) have been superseded by the use of across() in an existing verb. See vignette("colwise") for details.

它与 where 完美搭配，由 tidyselect 软件包提供（秘密地）。

在 R 中使用 case when 重新编码变量子集

Recode subset of variables using case when in R

r

case

dplyr

recode