使用 R 中 readr 中的 read_csv 将文本作为指定列以 [type] 打开

pass text as specified columns to open as [type] using read_csv from readr in R

我有一些要打开的 .csv 文件,将默认列类型指定为“i”(表示整数)。但是,某些文件也有特定的列,我想告诉 readr::read_csv 以定义的类型打开(哪些列的逻辑无关紧要,假设我知道哪些文件对应哪些列)

有没有办法将这些列传递到 read_csvcol_types 参数中,同时仍然保持每隔一列应以整数类型打开

df <- data.frame(
  a = c(1,2,3,4),
  b = sample(1:100, 4),
  c_text = c("hi", "I", "am", "text"),
  d_decimals = runif(4),
  e_more_text = c("another", "text", "column", "lol")
)

readr::write_csv(df, "/path/to/csv/file.csv")

character_cols <- c("c_text", "e_more_text")
double_cols <- "d_decimals"

data <- readr::read_csv(
  "/path/to/csv/file.csv",
  # supply something here to determine column types
  col_types = cols(.default = "i", character_cols = "c", double_cols = "d")
)

由于计算哪些列应该是字符或双精度等的逻辑。我最好将它们作为名称向量提供

干杯

您可以创建一个辅助函数,将您的额外规范与默认列规范相结合,然后将规范与 do.call 结合在一起。

extra_spec = list(
  "c_text" = "c",
  "d_decimals" = "i",
  "e_more_text" = "c"
)

read_csv_with_default_int = function(path, extra_spec) {
  readr::read_csv(path, col_types = do.call(cols, c(extra_spec, list(.default = col_integer()))))
}

read_csv_with_default_int("file.csv", extra_spec = extra_spec)

您还可以使用像

这样的助手来避免大量嵌套逻辑
cols_default_int = purrr::partial(cols, .default = col_integer())

read_csv_with_default_int = function(path, col_types) {
  readr::read_csv(path, col_types = do.call(cols_default_int, col_types))
}

read_csv_with_default_int("file.csv", col_types = extra_spec)