使用列表列或嵌套 data.frame 测试 tibbles 的相等性
Testing equality of tibbles with list-columns or nested data.frame
Tibbles(来自 tidyverse
)可以包含列表列,这对于包含例如data.frame.
中传统上找不到的嵌套数据框或对象
这是一个例子:
library("dplyr")
nested_df <-
iris %>%
group_by(Species) %>%
tidyr::nest() %>%
mutate(model = purrr::map(data, lm, formula = Sepal.Length ~ .))
nested_df
# # A tibble: 3 x 3
# Species data model
# <fct> <list> <list>
# 1 setosa <tibble [50 × 4]> <S3: lm>
# 2 versicolor <tibble [50 × 4]> <S3: lm>
# 3 virginica <tibble [50 × 4]> <S3: lm>
我正在用 testthat
编写一些测试:如何测试这些 data.frame 之间的相等性?
testthat::expect_equal
不起作用,因为 all.equal
和 dplyr::all_equal
都失败了:
all.equal(nested_df, nested_df)
# Error in equal_data_frame(target, current, ignore_col_order = ignore_col_order, :
# Can't join on 'data' x 'data' because of incompatible types (list / list)
我考虑过使用 testthat::expect_true(identical(...))
,但它通常过于严格。例如,定义完全相同的 nested_df2
不足以传递 identical
,因为 lm
模型中嵌入的 terms
的 .Environment
属性不同,尽管模型相等并通过 all.equal
.
identical(nested_df, nested_df2)
# [1] FALSE
identical(nested_df$model, nested_df2$model, ignore.environment = TRUE)
# [1] FALSE
all.equal(nested_df$model, nested_df2$model, tolerance = 0)
# [1] TRUE
如何测试 tibbles 与 nested_df
等列表列的相等性?
有点生硬的方法,但它似乎适用于您的示例:
all.equal.list(nested_df, nested_df)
# [1] TRUE
all.equal.list(nested_df, mutate(nested_df, Species = sample(Species)))
# [1] "Component “Species”: 2 string mismatches"
要扩展@utubun 的答案,您可以将 all.equal.list 包装在类似 testthat 的 expect_*
函数中:
expect_equal_tbl <- function(object, expected, ..., info = NULL) {
act <- testthat::quasi_label(rlang::enquo(object), arg = "object")
exp <- testthat::quasi_label(rlang::enquo(expected), arg = "expected")
# all.equal.list is slightly problematic: it returns TRUE for match, and
# returns a character vector when differences are observed. We extract
# both a match-indicator and a failure message
diffs <- all.equal.list(object, expected, ...)
has_diff <- if (is.logical(diffs)) diffs else FALSE
diff_msg <- paste(diffs, collapse = "\n")
testthat::expect(
has_diff,
failure_message = sprintf(
"%s not equal to %s.\n%s", act$lab, exp$lab, diff_msg
),
info = info
)
invisible(act$val)
}
expect_equal_tbl(nested_df, nested_df, info = "YAY!")
expect_equal_tbl(nested_df, nested_df[1, ], info = "FAIL!")
Error: `nested_df` not equal to nested_df[1, ].
Attributes: < Component “row.names”: Numeric: lengths (3, 1) differ >
Component “Species”: Lengths: 3, 1
Component “Species”: Lengths (3, 1) differ (string compare on first 1)
Component “data”: Length mismatch: comparison on first 1 components
Component “model”: Length mismatch: comparison on first 1 components
FAIL!
Tibbles(来自 tidyverse
)可以包含列表列,这对于包含例如data.frame.
这是一个例子:
library("dplyr")
nested_df <-
iris %>%
group_by(Species) %>%
tidyr::nest() %>%
mutate(model = purrr::map(data, lm, formula = Sepal.Length ~ .))
nested_df
# # A tibble: 3 x 3
# Species data model
# <fct> <list> <list>
# 1 setosa <tibble [50 × 4]> <S3: lm>
# 2 versicolor <tibble [50 × 4]> <S3: lm>
# 3 virginica <tibble [50 × 4]> <S3: lm>
我正在用 testthat
编写一些测试:如何测试这些 data.frame 之间的相等性?
testthat::expect_equal
不起作用,因为 all.equal
和 dplyr::all_equal
都失败了:
all.equal(nested_df, nested_df)
# Error in equal_data_frame(target, current, ignore_col_order = ignore_col_order, :
# Can't join on 'data' x 'data' because of incompatible types (list / list)
我考虑过使用 testthat::expect_true(identical(...))
,但它通常过于严格。例如,定义完全相同的 nested_df2
不足以传递 identical
,因为 lm
模型中嵌入的 terms
的 .Environment
属性不同,尽管模型相等并通过 all.equal
.
identical(nested_df, nested_df2)
# [1] FALSE
identical(nested_df$model, nested_df2$model, ignore.environment = TRUE)
# [1] FALSE
all.equal(nested_df$model, nested_df2$model, tolerance = 0)
# [1] TRUE
如何测试 tibbles 与 nested_df
等列表列的相等性?
有点生硬的方法,但它似乎适用于您的示例:
all.equal.list(nested_df, nested_df)
# [1] TRUE
all.equal.list(nested_df, mutate(nested_df, Species = sample(Species)))
# [1] "Component “Species”: 2 string mismatches"
要扩展@utubun 的答案,您可以将 all.equal.list 包装在类似 testthat 的 expect_*
函数中:
expect_equal_tbl <- function(object, expected, ..., info = NULL) {
act <- testthat::quasi_label(rlang::enquo(object), arg = "object")
exp <- testthat::quasi_label(rlang::enquo(expected), arg = "expected")
# all.equal.list is slightly problematic: it returns TRUE for match, and
# returns a character vector when differences are observed. We extract
# both a match-indicator and a failure message
diffs <- all.equal.list(object, expected, ...)
has_diff <- if (is.logical(diffs)) diffs else FALSE
diff_msg <- paste(diffs, collapse = "\n")
testthat::expect(
has_diff,
failure_message = sprintf(
"%s not equal to %s.\n%s", act$lab, exp$lab, diff_msg
),
info = info
)
invisible(act$val)
}
expect_equal_tbl(nested_df, nested_df, info = "YAY!")
expect_equal_tbl(nested_df, nested_df[1, ], info = "FAIL!")
Error: `nested_df` not equal to nested_df[1, ].
Attributes: < Component “row.names”: Numeric: lengths (3, 1) differ >
Component “Species”: Lengths: 3, 1
Component “Species”: Lengths (3, 1) differ (string compare on first 1)
Component “data”: Length mismatch: comparison on first 1 components
Component “model”: Length mismatch: comparison on first 1 components
FAIL!