将数据框中的某些值替换为 NA
Replacing certain values in a data frame as NAs
假设我有一个 data.frame
names <- c("John", "Mark", "Larry", "Will", "Kate", "Daria", "Tom")
gender <- c("M", "M", "M", "M", "F", "F", "M")
mark <- c(1, 2, 3, 1, 2, 3, 1)
df <- data.frame(names, gender, mark)
df
names gender mark
1 John M 1
2 Mark M 2
3 Larry M 3
4 Will M 1
5 Kate F 2
6 Daria F 3
7 Tom M 1
我不知道如何将某些值替换为 NAs
。例如,如果我希望 mark
对于 Kate
、Daria
和 Tom
为 NAs
:
names gender mark
1 John M 1
2 Mark M 2
3 Larry M 3
4 Will M 1
5 Kate F NA
6 Daria F NA
7 Tom M NA
尝试
df <- within(df, mark <- replace(mark, names %in% c('Kate', 'Daria', 'Tom'), NA))
df
# names gender mark
#1 John M 1
#2 Mark M 2
#3 Larry M 3
#4 Will M 1
#5 Kate F NA
#6 Daria F NA
#7 Tom M NA
或者
df$mark[df$names %in% c('Kate', 'Daria', 'Tom')] <- NA
或者
is.na(df$mark) <- df$names %in% c('Kate', 'Daria', 'Tom')
is.na(df$mark[df$names %in% c('Kate', 'Daria', 'Tom')]) <- TRUE
是一种我有时觉得很有用的语法。在这种情况下没那么快。
基准
big.df1 <- data.frame(names = rep(names, 1e3),
gender = rep(gender, 1e3),
mark = rep(mark, 1e3))
big.df4 <- big.df3 <- big.df2 <- big.df1
microbenchmark(
plafort = is.na(big.df1$mark[big.df1$names %in% c('Kate', 'Daria', 'Tom')]) <- TRUE,
akrun1 = within(big.df2, mark <- replace(mark, names %in% c('Kate', 'Daria', 'Tom'), NA)),
akrun2 = big.df3$mark[big.df3$names %in% c('Kate', 'Daria', 'Tom')] <- NA,
akrun3 = is.na(big.df4$mark) <- big.df4$names %in% c('Kate', 'Daria', 'Tom')
)
#
# Unit: microseconds
# expr min lq mean median uq
# plafort 389.623 408.9660 484.6090 426.9275 540.8135
# akrun1 287.381 319.3570 388.3125 357.2530 419.8220
# akrun2 193.035 204.2860 627.6559 227.7735 327.8440
# akrun3 208.431 221.6555 274.1615 235.2740 287.3825
# max neval
# 777.272 100
# 661.214 100
# 37325.194 100
# 1110.445 100
假设我有一个 data.frame
names <- c("John", "Mark", "Larry", "Will", "Kate", "Daria", "Tom")
gender <- c("M", "M", "M", "M", "F", "F", "M")
mark <- c(1, 2, 3, 1, 2, 3, 1)
df <- data.frame(names, gender, mark)
df
names gender mark
1 John M 1
2 Mark M 2
3 Larry M 3
4 Will M 1
5 Kate F 2
6 Daria F 3
7 Tom M 1
我不知道如何将某些值替换为 NAs
。例如,如果我希望 mark
对于 Kate
、Daria
和 Tom
为 NAs
:
names gender mark
1 John M 1
2 Mark M 2
3 Larry M 3
4 Will M 1
5 Kate F NA
6 Daria F NA
7 Tom M NA
尝试
df <- within(df, mark <- replace(mark, names %in% c('Kate', 'Daria', 'Tom'), NA))
df
# names gender mark
#1 John M 1
#2 Mark M 2
#3 Larry M 3
#4 Will M 1
#5 Kate F NA
#6 Daria F NA
#7 Tom M NA
或者
df$mark[df$names %in% c('Kate', 'Daria', 'Tom')] <- NA
或者
is.na(df$mark) <- df$names %in% c('Kate', 'Daria', 'Tom')
is.na(df$mark[df$names %in% c('Kate', 'Daria', 'Tom')]) <- TRUE
是一种我有时觉得很有用的语法。在这种情况下没那么快。
基准
big.df1 <- data.frame(names = rep(names, 1e3),
gender = rep(gender, 1e3),
mark = rep(mark, 1e3))
big.df4 <- big.df3 <- big.df2 <- big.df1
microbenchmark(
plafort = is.na(big.df1$mark[big.df1$names %in% c('Kate', 'Daria', 'Tom')]) <- TRUE,
akrun1 = within(big.df2, mark <- replace(mark, names %in% c('Kate', 'Daria', 'Tom'), NA)),
akrun2 = big.df3$mark[big.df3$names %in% c('Kate', 'Daria', 'Tom')] <- NA,
akrun3 = is.na(big.df4$mark) <- big.df4$names %in% c('Kate', 'Daria', 'Tom')
)
#
# Unit: microseconds
# expr min lq mean median uq
# plafort 389.623 408.9660 484.6090 426.9275 540.8135
# akrun1 287.381 319.3570 388.3125 357.2530 419.8220
# akrun2 193.035 204.2860 627.6559 227.7735 327.8440
# akrun3 208.431 221.6555 274.1615 235.2740 287.3825
# max neval
# 777.272 100
# 661.214 100
# 37325.194 100
# 1110.445 100