使用 dplyr::filter() 删除 NA 观测值

Removing NA observations with dplyr::filter()

我的数据是这样的:

library(tidyverse)

df <- tribble(
    ~a, ~b, ~c,
    1, 2, 3, 
    1, NA, 3, 
    NA, 2, 3
)

我可以使用 drop_na():

删除所有 NA 观察结果
df %>% drop_na()

或删除单个列中的所有 NA 个观察值(例如 a):

df %>% drop_na(a)

为什么我不能只使用普通的 != 过滤管?

df %>% filter(a != NA)

为什么我们必须使用 tidyr 的特殊功能来删除 NA?

来自@Ben Bolker:

[T]his has nothing specifically to do with dplyr::filter()

来自@Marat Talipov:

[A]ny comparison with NA, including NA==NA, will return NA

来自@farnsy 的related answer

The == operator does not treat NA's as you would expect it to.

Think of NA as meaning "I don't know what's there". The correct answer to 3 > NA is obviously NA because we don't know if the missing value is larger than 3 or not. Well, it's the same for NA == NA. They are both missing values but the true values could be quite different, so the correct answer is "I don't know."

R doesn't know what you are doing in your analysis, so instead of potentially introducing bugs that would later end up being published an embarrassing you, it doesn't allow comparison operators to think NA is a value.

例如:

您可以使用:

df %>% filter(!is.na(a))

删除 a 列中的 NA。

如果有人在2020年,在制作完所有管道后,if u pipe %>% na.exclude将带走管道中的所有NA!

我一直在用它,而且效果很好

cool$day[cool$day==''] <- NA  
cool$day[is.na(cool$day)] <- "NA"

cool <- cool[!cool$day == "NA", ]