使用 data.table 连接基于元数据避免 for 循环转换数据

Transform data based on metadata avoiding for-loops using data.table joins

问题: 我有以下元数据 data.table 对象。基于此,我想将实际 data.table dtextensionstart_date 列转换为日期列。我有一个解决方案,我迭代 meta_dt 的行。因为我想避免 for 循环,你能想到一个聪明的 data.table 连接吗?

library(data.table)

meta_dt <- data.table(
  col_n = c("id", "description", "extension", "start_date"),
  type = c("character", "character", "date", "date"),
  form = c(NA, NA, "%Y-%m-%d", "%Y-%m-%d")
)

dt <- data.table(
  id = c(1, 2, 3, 4),
  description = c("ab", "ac", "ad", "ae"),
  extension = c("2020-01-01", "2020-12-31", "2020-05-01", "2020-01-04"),
  start_date = c("2020-09-01", "2020-11-31", "2020-08-19", "2020-03-14")
)

预期结果:预期结果的结构应如下所示(即仅转换元数据中指定为日期的列,其他列不受影响):

Classes ‘data.table’ and 'data.frame':  4 obs. of  4 variables:
 $ id         : num  1 2 3 4
 $ description: chr  "ab" "ac" "ad" "ae"
 $ extension  : Date, format: "2020-01-01" "2020-12-31" ...
 $ start_date : Date, format: "2020-09-01" "2020-11-30" ...

这是 set() 的一个选项:

for (i in seq_along(dt)) {
  correct_type <- meta_dt[col_n == names(dt)[i], type]
  if (!inherits(dt[[i]], correct_type)) {
    if (correct_type %in% c("date", "Date")) {
      format <- meta_dt[col_n == names(dt)[i], form]
      set(dt, j = i, value = as.Date(dt[[i]], format))
    } else {
      set(dt, j = i, value = as(dt[[i]], correct_type))
    }
  }
}

> str(dt)
Classes ‘data.table’ and 'data.frame':  4 obs. of  4 variables:
 $ id         : chr  "1" "2" "3" "4"
 $ description: chr  "ab" "ac" "ad" "ae"
 $ extension  : Date, format: "2020-01-01" "2020-12-31" "2020-05-01" "2020-01-04"
 $ start_date : Date, format: "2020-09-01" NA "2020-08-19" "2020-03-14"

注意

  • 日期对象的正确 class 名称以大写开头 Date
  • 2020-11-31 不是公历中的有效日期,因此被转换为 NA