将数据框中的某些值替换为 NA

Replacing certain values in a data frame as NAs

假设我有一个 data.frame

names  <- c("John", "Mark", "Larry", "Will", "Kate", "Daria", "Tom")
gender <- c("M", "M", "M", "M", "F", "F", "M")
mark <- c(1, 2, 3, 1, 2, 3, 1)
df <- data.frame(names, gender, mark)
df

  names gender mark
1  John      M    1
2  Mark      M    2
3 Larry      M    3
4  Will      M    1
5  Kate      F    2
6 Daria      F    3
7   Tom      M    1

我不知道如何将某些值替换为 NAs。例如,如果我希望 mark 对于 KateDariaTomNAs:

  names gender mark
1  John      M    1
2  Mark      M    2
3 Larry      M    3
4  Will      M    1
5  Kate      F    NA
6 Daria      F    NA
7   Tom      M    NA

尝试

df <- within(df, mark <- replace(mark, names %in% c('Kate', 'Daria', 'Tom'), NA))
df
#    names gender mark
#1  John      M    1
#2  Mark      M    2
#3 Larry      M    3
#4  Will      M    1
#5  Kate      F   NA
#6 Daria      F   NA
#7   Tom      M   NA

或者

 df$mark[df$names %in% c('Kate', 'Daria', 'Tom')] <- NA

或者

 is.na(df$mark) <- df$names %in% c('Kate', 'Daria', 'Tom')
is.na(df$mark[df$names %in% c('Kate', 'Daria', 'Tom')]) <- TRUE

是一种我有时觉得很有用的语法。在这种情况下没那么快。

基准

big.df1 <- data.frame(names = rep(names, 1e3), 
                      gender = rep(gender, 1e3), 
                      mark = rep(mark, 1e3))
big.df4 <- big.df3 <- big.df2 <- big.df1

microbenchmark(
  plafort = is.na(big.df1$mark[big.df1$names %in% c('Kate', 'Daria', 'Tom')]) <- TRUE,
  akrun1  = within(big.df2, mark <- replace(mark, names %in% c('Kate', 'Daria', 'Tom'), NA)),
  akrun2  = big.df3$mark[big.df3$names %in% c('Kate', 'Daria', 'Tom')] <- NA,
  akrun3  = is.na(big.df4$mark) <- big.df4$names %in% c('Kate', 'Daria', 'Tom')
  )
# 
# Unit: microseconds
#     expr     min       lq     mean   median       uq
#  plafort 389.623 408.9660 484.6090 426.9275 540.8135
#   akrun1 287.381 319.3570 388.3125 357.2530 419.8220
#   akrun2 193.035 204.2860 627.6559 227.7735 327.8440
#   akrun3 208.431 221.6555 274.1615 235.2740 287.3825
#        max neval
#    777.272   100
#    661.214   100
#  37325.194   100
#   1110.445   100