如何让 switch() 处理 NA?

How to get switch() to handle NA?

好的,我必须重新编码一个 df,因为我想要整数形式的因子:

library(dplyr)

load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/crash2.rda'))

df <- crash2 %>% select(source, sex)

df$source <- sapply(df$source, switch, "telephone" = 1, "telephone entered manually" = 2, "electronic CRF by email" = 3, "paper CRF enteredd in electronic CRF" = 4, "electronic CRF" = 5, NA)

这按预期工作,但下一个变量(性别)中有 NA,事情变得复杂了:

df$sex <- sapply(df$sex, switch, "male" = 1, "female" = 2, NA)

returns 一个 NA 被遗忘的列表。使用 unlist() returns 一个对于 df 来说太短的向量。

length(unlist(sapply(df$sex, switch, "male" = 1, "female" = 2, NA)))

应该是20207,但是是20206.

我想要的是通过将 NA 返回为 NA 来匹配 df 的向量。

除了可行的解决方案外,如果您能解释我哪里出错以及代码的实际工作原理,我将非常感谢。

编辑:感谢您的所有回答。通常情况下,我应该自己注意到一个更有效的解决方案(好吧,我自己注意到了,但显然为时已晚):

>str(df$sex)
Factor w/ 2 levels "male","female": 1 2 1 1 2 1 1 1 1 1 ...

所以我可以用 as.numeric() 得到我想要的东西。

您可以使用 `NA`.

x
# [1] "a" "e" "a" "a" NA  "d" "b" "b" NA  "d"
unname(sapply(x, switch, "a"=1, "b"=2, "c"=3, "d"=4, "e"=5, `NA`=NA))
# [1]  1  5  1  1 NA  4  2  2 NA  4

数据:

x <- c("a", "e", "a", "a", NA, "d", "b", "b", NA, "d")

如果您有兴趣,还有一种 dplyr 方法可以使用 case_when():

load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/crash2.rda'))

df <- crash2 %>% dplyr::select(source, sex) %>% 
  mutate(source = case_when(
    source == "telephone"~1, 
    source == "telephone entered manually"~2, 
    source == "electronic CRF by email"~3, 
    source == "paper CRF enteredd in electronic CRF"~4, 
    source == "electronic CRF"~5), 
    sex = case_when(
      sex == "male" ~ 1, 
      sex == "female" ~ 2))

table(df$sex, useNA="ifany")
#     1     2  <NA> 
# 16935  3271     1