使用 testthat 检查数据框中的每个变量的 NA 值

Using testthat to check each variable in a data frame for NA values

我正在从非常混乱的原始文件构建数据集,并使用 testthat 来确保在添加新数据或更正清理规则时不会出现问题。我想添加一个测试来查看数据中是否有任何 NA 值,如果有,则报告它们在哪些列中。

通过为每一列编写测试,手动执行此操作很简单。但是该解决方案将很难维护并且容易出错,因为我不想每次在数据集中添加或删除列时都必须记住更新 test-NA 文件。

这是我所拥有的示例代码

df <- tidyr::tribble(
  ~A, ~B, ~C, 
  1, 2, 3,
  NA, 2, 3, 
  1, 2, NA
)

# checks all variables, doesn't report which have NA values
testthat::test_that("NA Values", {
  testthat::expect_true(sum(is.na(df)) == 0)
})

# Checks each column, but is a pain to maintain
testthat::test_that("Variable specific checks", {
  testthat::expect_true(sum(is.na(df$A)) == 0)
  testthat::expect_true(sum(is.na(df$B)) == 0)
  testthat::expect_true(sum(is.na(df$C)) == 0)
})

解决方案 1:快速且(不那么)脏

df <- tidyr::tribble(
  ~A, ~B, ~C, 
  1, 2, 3,
  NA, 2, 3, 
  1, 2, NA
)

# Checks each column, but is a pain to maintain
testthat::test_that("Variable specific checks", {
    res <- apply(df, 2, function(x) sum(is.na(x))>0)
    testthat::expect_true(all(res), label = paste(paste(which(res), collapse=", "), "contain(s) NA(s)"))
})

哪个应该return

Error: Test failed: 'Variable specific checks'
* 1, 3 contain(s) NA isn't true.

解决方案 2:根据您的需要定制 expect_() 函数

expect_true2 <- function(object, info = NULL, label = NULL) {
        act <- testthat::quasi_label(rlang::enquo(object), label, arg = "object")
        testthat::expect(identical(as.vector(act$val), TRUE), sprintf("Column %s contain(s) NA(s).", 
            act$lab), info = info)
        invisible(act$val)
    }
testthat::test_that("Variable specific checks", {
    res <- apply(df, 2, function(x) sum(is.na(x))>0)
    expect_true2(all(res), label = paste(which(res), collapse=","))
})

哪个应该return

Error: Test failed: 'Variable specific checks'
* Column 1,3 contain(s) NA(s).