如何让 switch() 处理 NA?
How to get switch() to handle NA?
好的,我必须重新编码一个 df,因为我想要整数形式的因子:
library(dplyr)
load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/crash2.rda'))
df <- crash2 %>% select(source, sex)
df$source <- sapply(df$source, switch, "telephone" = 1, "telephone entered manually" = 2, "electronic CRF by email" = 3, "paper CRF enteredd in electronic CRF" = 4, "electronic CRF" = 5, NA)
这按预期工作,但下一个变量(性别)中有 NA,事情变得复杂了:
df$sex <- sapply(df$sex, switch, "male" = 1, "female" = 2, NA)
returns 一个 NA 被遗忘的列表。使用 unlist()
returns 一个对于 df 来说太短的向量。
length(unlist(sapply(df$sex, switch, "male" = 1, "female" = 2, NA)))
应该是20207
,但是是20206
.
我想要的是通过将 NA 返回为 NA 来匹配 df 的向量。
除了可行的解决方案外,如果您能解释我哪里出错以及代码的实际工作原理,我将非常感谢。
编辑:感谢您的所有回答。通常情况下,我应该自己注意到一个更有效的解决方案(好吧,我自己注意到了,但显然为时已晚):
>str(df$sex)
Factor w/ 2 levels "male","female": 1 2 1 1 2 1 1 1 1 1 ...
所以我可以用 as.numeric()
得到我想要的东西。
您可以使用 `NA`
.
x
# [1] "a" "e" "a" "a" NA "d" "b" "b" NA "d"
unname(sapply(x, switch, "a"=1, "b"=2, "c"=3, "d"=4, "e"=5, `NA`=NA))
# [1] 1 5 1 1 NA 4 2 2 NA 4
数据:
x <- c("a", "e", "a", "a", NA, "d", "b", "b", NA, "d")
如果您有兴趣,还有一种 dplyr
方法可以使用 case_when()
:
load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/crash2.rda'))
df <- crash2 %>% dplyr::select(source, sex) %>%
mutate(source = case_when(
source == "telephone"~1,
source == "telephone entered manually"~2,
source == "electronic CRF by email"~3,
source == "paper CRF enteredd in electronic CRF"~4,
source == "electronic CRF"~5),
sex = case_when(
sex == "male" ~ 1,
sex == "female" ~ 2))
table(df$sex, useNA="ifany")
# 1 2 <NA>
# 16935 3271 1
好的,我必须重新编码一个 df,因为我想要整数形式的因子:
library(dplyr)
load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/crash2.rda'))
df <- crash2 %>% select(source, sex)
df$source <- sapply(df$source, switch, "telephone" = 1, "telephone entered manually" = 2, "electronic CRF by email" = 3, "paper CRF enteredd in electronic CRF" = 4, "electronic CRF" = 5, NA)
这按预期工作,但下一个变量(性别)中有 NA,事情变得复杂了:
df$sex <- sapply(df$sex, switch, "male" = 1, "female" = 2, NA)
returns 一个 NA 被遗忘的列表。使用 unlist()
returns 一个对于 df 来说太短的向量。
length(unlist(sapply(df$sex, switch, "male" = 1, "female" = 2, NA)))
应该是20207
,但是是20206
.
我想要的是通过将 NA 返回为 NA 来匹配 df 的向量。
除了可行的解决方案外,如果您能解释我哪里出错以及代码的实际工作原理,我将非常感谢。
编辑:感谢您的所有回答。通常情况下,我应该自己注意到一个更有效的解决方案(好吧,我自己注意到了,但显然为时已晚):
>str(df$sex)
Factor w/ 2 levels "male","female": 1 2 1 1 2 1 1 1 1 1 ...
所以我可以用 as.numeric()
得到我想要的东西。
您可以使用 `NA`
.
x
# [1] "a" "e" "a" "a" NA "d" "b" "b" NA "d"
unname(sapply(x, switch, "a"=1, "b"=2, "c"=3, "d"=4, "e"=5, `NA`=NA))
# [1] 1 5 1 1 NA 4 2 2 NA 4
数据:
x <- c("a", "e", "a", "a", NA, "d", "b", "b", NA, "d")
如果您有兴趣,还有一种 dplyr
方法可以使用 case_when()
:
load(url('http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/crash2.rda'))
df <- crash2 %>% dplyr::select(source, sex) %>%
mutate(source = case_when(
source == "telephone"~1,
source == "telephone entered manually"~2,
source == "electronic CRF by email"~3,
source == "paper CRF enteredd in electronic CRF"~4,
source == "electronic CRF"~5),
sex = case_when(
sex == "male" ~ 1,
sex == "female" ~ 2))
table(df$sex, useNA="ifany")
# 1 2 <NA>
# 16935 3271 1