使用 testthat 检查数据框中的每个变量的 NA 值
Using testthat to check each variable in a data frame for NA values
我正在从非常混乱的原始文件构建数据集,并使用 testthat
来确保在添加新数据或更正清理规则时不会出现问题。我想添加一个测试来查看数据中是否有任何 NA
值,如果有,则报告它们在哪些列中。
通过为每一列编写测试,手动执行此操作很简单。但是该解决方案将很难维护并且容易出错,因为我不想每次在数据集中添加或删除列时都必须记住更新 test-NA
文件。
这是我所拥有的示例代码
df <- tidyr::tribble(
~A, ~B, ~C,
1, 2, 3,
NA, 2, 3,
1, 2, NA
)
# checks all variables, doesn't report which have NA values
testthat::test_that("NA Values", {
testthat::expect_true(sum(is.na(df)) == 0)
})
# Checks each column, but is a pain to maintain
testthat::test_that("Variable specific checks", {
testthat::expect_true(sum(is.na(df$A)) == 0)
testthat::expect_true(sum(is.na(df$B)) == 0)
testthat::expect_true(sum(is.na(df$C)) == 0)
})
解决方案 1:快速且(不那么)脏
df <- tidyr::tribble(
~A, ~B, ~C,
1, 2, 3,
NA, 2, 3,
1, 2, NA
)
# Checks each column, but is a pain to maintain
testthat::test_that("Variable specific checks", {
res <- apply(df, 2, function(x) sum(is.na(x))>0)
testthat::expect_true(all(res), label = paste(paste(which(res), collapse=", "), "contain(s) NA(s)"))
})
哪个应该return
Error: Test failed: 'Variable specific checks'
* 1, 3 contain(s) NA isn't true.
解决方案 2:根据您的需要定制 expect_() 函数
expect_true2 <- function(object, info = NULL, label = NULL) {
act <- testthat::quasi_label(rlang::enquo(object), label, arg = "object")
testthat::expect(identical(as.vector(act$val), TRUE), sprintf("Column %s contain(s) NA(s).",
act$lab), info = info)
invisible(act$val)
}
testthat::test_that("Variable specific checks", {
res <- apply(df, 2, function(x) sum(is.na(x))>0)
expect_true2(all(res), label = paste(which(res), collapse=","))
})
哪个应该return
Error: Test failed: 'Variable specific checks'
* Column 1,3 contain(s) NA(s).
我正在从非常混乱的原始文件构建数据集,并使用 testthat
来确保在添加新数据或更正清理规则时不会出现问题。我想添加一个测试来查看数据中是否有任何 NA
值,如果有,则报告它们在哪些列中。
通过为每一列编写测试,手动执行此操作很简单。但是该解决方案将很难维护并且容易出错,因为我不想每次在数据集中添加或删除列时都必须记住更新 test-NA
文件。
这是我所拥有的示例代码
df <- tidyr::tribble(
~A, ~B, ~C,
1, 2, 3,
NA, 2, 3,
1, 2, NA
)
# checks all variables, doesn't report which have NA values
testthat::test_that("NA Values", {
testthat::expect_true(sum(is.na(df)) == 0)
})
# Checks each column, but is a pain to maintain
testthat::test_that("Variable specific checks", {
testthat::expect_true(sum(is.na(df$A)) == 0)
testthat::expect_true(sum(is.na(df$B)) == 0)
testthat::expect_true(sum(is.na(df$C)) == 0)
})
解决方案 1:快速且(不那么)脏
df <- tidyr::tribble(
~A, ~B, ~C,
1, 2, 3,
NA, 2, 3,
1, 2, NA
)
# Checks each column, but is a pain to maintain
testthat::test_that("Variable specific checks", {
res <- apply(df, 2, function(x) sum(is.na(x))>0)
testthat::expect_true(all(res), label = paste(paste(which(res), collapse=", "), "contain(s) NA(s)"))
})
哪个应该return
Error: Test failed: 'Variable specific checks'
* 1, 3 contain(s) NA isn't true.
解决方案 2:根据您的需要定制 expect_() 函数
expect_true2 <- function(object, info = NULL, label = NULL) {
act <- testthat::quasi_label(rlang::enquo(object), label, arg = "object")
testthat::expect(identical(as.vector(act$val), TRUE), sprintf("Column %s contain(s) NA(s).",
act$lab), info = info)
invisible(act$val)
}
testthat::test_that("Variable specific checks", {
res <- apply(df, 2, function(x) sum(is.na(x))>0)
expect_true2(all(res), label = paste(which(res), collapse=","))
})
哪个应该return
Error: Test failed: 'Variable specific checks'
* Column 1,3 contain(s) NA(s).