用于预览 csv 的 R readr 函数

Question

我正在 readr 或 R base 中寻找一个函数或解决方法，以 “预览”read_csv 在实际导入数据 之前将猜测的列类型。我正在处理几个大约 60Mb 大小的文件，其中包含 51 列和 160k 行，这样可以更容易地为 read_csv.

构建 col_types 规范

我的借口，如果这听起来像是一个显而易见的问题。我在论坛中找不到这个特定问题的答案，并且最近才开始使用 dplyr。谢谢。

Answer 1

进入 readr 代码并尝试做一些手术以使用 read_csv 功能代码，但仅限于猜测规范。

getReaderSpec <- function (file, col_names = TRUE, col_types = NULL, locale = default_locale(), 
                           na = c("", "NA"), quoted_na = TRUE, quote = "\"", 
                           comment = "", trim_ws = TRUE, skip = 0, n_max = Inf, 
                           guess_max = min(1000, n_max), progress = show_progress(), 
                           skip_empty_rows = TRUE) 
{
  tokenizer <- readr:::tokenizer_csv(na = na, quoted_na = quoted_na, 
                             quote = quote, comment = comment, trim_ws = trim_ws, 
                             skip_empty_rows = skip_empty_rows)
  name <- readr:::source_name(file)
  file <- readr:::standardise_path(file)
  if (readr:::is.connection(file)) {
    data <- readr:::datasource_connection(file, skip, skip_empty_rows, 
                                  comment)
    if (readr:::empty_file(data[[1]])) {
      return(tibble::tibble())
    }
  }
  else {
    if (!isTRUE(grepl("\n", file)[[1]]) && readr:::empty_file(file)) {
      return(tibble::tibble())
    }
    if (is.character(file) && identical(locale$encoding, 
                                        "UTF-8")) {
      data <- enc2utf8(file)
    }
    else {
      data <- file
    }
  }
  spec <- readr:::col_spec_standardise(data, skip = skip, skip_empty_rows = skip_empty_rows, 
                               comment = comment, guess_max = guess_max, col_names = col_names, 
                               col_types = col_types, tokenizer = tokenizer, locale = locale)
  readr:::show_cols_spec(spec)
  invisible(spec)
}

myspec <- getReaderSpec("someexample.csv")

用于预览 csv 的 R readr 函数

R readr function for previewing csv

csv

r

dplyr

readr