如何使用 readr::write_delim() 写一个 .csv 封闭

How to use readr::write_delim() to write a .csv enclosed

我正在尝试使用 write_delim 在 S3 中构建一个文件,我希望它用双引号 (") 括起来,但是我不知道它是否不是 write_delim 函数中的参数我将需要使用基本 R 函数,或者如果我做错了。这是我尝试过的方法

s3write_using(file_filtered,
              FUN = write_delim,
              delim = ",",
              na = "",
              object = paste0(output_path,
                              "file-",
                              lubridate::today(),
                              ".csv"),
              bucket = input_bucket)

s3write_using(file_filtered,
              FUN = write_delim,
              delim = ",",
              na = "",
              quote = "double",
              object = paste0(output_path,
                              "file-",
                              lubridate::today(),
                              ".csv"),
              bucket = input_bucket)



如果我没理解错的话,您想将一个 csv 写入您的 S3 存储桶,其中在开头包含一个引号,在结尾包含一个引号。

来自 s3write_using 文档:

FUN: For s3write_using, a function to which x and a file path will be passed (in that order).

因此,您只需定义一个函数,该函数将 R 对象作为其第一个参数,并将引号括起的 csv 字符串写入作为第二个参数传递的路径。

如果你真的担心优化问题,readr::write_delim 肯定比 write.csv 快,但是 data.table 库有一个更快的函数,fwrite,它允许以与 write.csv

相同的方式引用
write_quoted_csv <- function(object, path)
{
  data.table::setDT(object)
  data.table::fwrite(object, path, quote = TRUE)
  data.table::setDF(object)
}

让我们使用包含 100,000 行的数据框针对 write_delim 进行测试:

df <- data.frame(a = 1:50000, 
                 b = 50001:100000, 
                 c = rep(LETTERS[1:10], each = 5000))

microbenchmark::microbenchmark(
  readr      = readr::write_delim(df, "~/test_readr.csv", delim = ",", na = ""),
  data.table = write_quoted_csv(df, "~/test_datatable.csv"), 
  times      = 100)
# Unit: milliseconds
#        expr       min       lq      mean    median        uq       max neval
#       readr 244.87593 257.6236 276.91877 262.86998 283.07285 416.79254   100
#  data.table  20.80768  22.8940  26.25808  24.92915  27.69624  54.55789   100

您可以看到 data.table 方法快了 10 倍以上。即使那样,write_delim 也不会放入引号,而 fwrite 会:

cat(readLines("~/test_readr.csv", 10), sep = "\n")
#> a,b,c
#> 1,50001,A
#> 2,50002,A
#> 3,50003,A
#> 4,50004,A
#> 5,50005,A
#> 6,50006,A
#> 7,50007,A
#> 8,50008,A
#> 9,50009,A
cat(readLines("~/test_datatable.csv", 10), sep = "\n")
#> "a","b","c"
#> 1,50001,"A"
#> 2,50002,"A"
#> 3,50003,"A"
#> 4,50004,"A"
#> 5,50005,"A"
#> 6,50006,"A"
#> 7,50007,"A"
#> 8,50008,"A"
#> 9,50009,"A"

所以,通过超快的方法,你可以这样写你的s3文件:

s3write_using(file_filtered,
              FUN = write_quoted_csv,
              object = paste0(output_path, "file-", lubridate::today(), ".csv"),
              bucket = input_bucket)