在 R 中 NA == NA?

in R does NA == NA?

identical(NA, NA) returns TRUE,但以下代码过滤 NA 超出日期范围:

library(tidyverse)
filter(starwars, birth_year == birth_year)

如果 NA 确实等于 NA,上面的 starwars 过滤数据框应该包括 NA 的出生年份。为什么不呢?

NA 是 identical 到 NA,但不等于它。如果你 运行 NA==NA,响应将是 NA,因为等于运算符不适用于 NA。来自 identical 文档:

A call to identical is the way to test exact equality in if and while statements, as well as in logical expressions that use && or ||. In all these applications you need to be assured of getting a single logical value.

Users often use the comparison operators, such as == or !=, in these situations. It looks natural, but it is not what these operators are designed to do in R. They return an object like the arguments. If you expected x and y to be of length 1, but it happened that one of them was not, you will not get a single FALSE. Similarly, if one of the arguments is NA, the result is also NA. In either case, the expression if(x == y).... won't work as expected.

并且来自 == 的文档:

Missing values (NA) and NaN values are regarded as non-comparable even to themselves, so comparisons involving them will always result in NA. Missing values can also result when character strings are compared and one is not valid in the current collation locale.

基本原理是缺失值在概念层面上彼此不同。它们可能代表非常不同的值,但我们只是不知道这些值是什么。

在这种情况下,另一种方法是添加 | is.na(birth_year)