如何使用 readr::write_delim() 写一个 .csv 封闭
How to use readr::write_delim() to write a .csv enclosed
我正在尝试使用 write_delim 在 S3 中构建一个文件,我希望它用双引号 (") 括起来,但是我不知道它是否不是 write_delim 函数中的参数我将需要使用基本 R 函数,或者如果我做错了。这是我尝试过的方法
s3write_using(file_filtered,
FUN = write_delim,
delim = ",",
na = "",
object = paste0(output_path,
"file-",
lubridate::today(),
".csv"),
bucket = input_bucket)
s3write_using(file_filtered,
FUN = write_delim,
delim = ",",
na = "",
quote = "double",
object = paste0(output_path,
"file-",
lubridate::today(),
".csv"),
bucket = input_bucket)
如果我没理解错的话,您想将一个 csv 写入您的 S3 存储桶,其中在开头包含一个引号,在结尾包含一个引号。
来自 s3write_using 文档:
FUN: For s3write_using, a function to which x and a file path will be
passed (in that order).
因此,您只需定义一个函数,该函数将 R 对象作为其第一个参数,并将引号括起的 csv 字符串写入作为第二个参数传递的路径。
如果你真的担心优化问题,readr::write_delim
肯定比 write.csv
快,但是 data.table 库有一个更快的函数,fwrite
,它允许以与 write.csv
相同的方式引用
write_quoted_csv <- function(object, path)
{
data.table::setDT(object)
data.table::fwrite(object, path, quote = TRUE)
data.table::setDF(object)
}
让我们使用包含 100,000 行的数据框针对 write_delim
进行测试:
df <- data.frame(a = 1:50000,
b = 50001:100000,
c = rep(LETTERS[1:10], each = 5000))
microbenchmark::microbenchmark(
readr = readr::write_delim(df, "~/test_readr.csv", delim = ",", na = ""),
data.table = write_quoted_csv(df, "~/test_datatable.csv"),
times = 100)
# Unit: milliseconds
# expr min lq mean median uq max neval
# readr 244.87593 257.6236 276.91877 262.86998 283.07285 416.79254 100
# data.table 20.80768 22.8940 26.25808 24.92915 27.69624 54.55789 100
您可以看到 data.table 方法快了 10 倍以上。即使那样,write_delim
也不会放入引号,而 fwrite
会:
cat(readLines("~/test_readr.csv", 10), sep = "\n")
#> a,b,c
#> 1,50001,A
#> 2,50002,A
#> 3,50003,A
#> 4,50004,A
#> 5,50005,A
#> 6,50006,A
#> 7,50007,A
#> 8,50008,A
#> 9,50009,A
cat(readLines("~/test_datatable.csv", 10), sep = "\n")
#> "a","b","c"
#> 1,50001,"A"
#> 2,50002,"A"
#> 3,50003,"A"
#> 4,50004,"A"
#> 5,50005,"A"
#> 6,50006,"A"
#> 7,50007,"A"
#> 8,50008,"A"
#> 9,50009,"A"
所以,通过超快的方法,你可以这样写你的s3文件:
s3write_using(file_filtered,
FUN = write_quoted_csv,
object = paste0(output_path, "file-", lubridate::today(), ".csv"),
bucket = input_bucket)
我正在尝试使用 write_delim 在 S3 中构建一个文件,我希望它用双引号 (") 括起来,但是我不知道它是否不是 write_delim 函数中的参数我将需要使用基本 R 函数,或者如果我做错了。这是我尝试过的方法
s3write_using(file_filtered,
FUN = write_delim,
delim = ",",
na = "",
object = paste0(output_path,
"file-",
lubridate::today(),
".csv"),
bucket = input_bucket)
s3write_using(file_filtered,
FUN = write_delim,
delim = ",",
na = "",
quote = "double",
object = paste0(output_path,
"file-",
lubridate::today(),
".csv"),
bucket = input_bucket)
如果我没理解错的话,您想将一个 csv 写入您的 S3 存储桶,其中在开头包含一个引号,在结尾包含一个引号。
来自 s3write_using 文档:
FUN: For s3write_using, a function to which x and a file path will be passed (in that order).
因此,您只需定义一个函数,该函数将 R 对象作为其第一个参数,并将引号括起的 csv 字符串写入作为第二个参数传递的路径。
如果你真的担心优化问题,readr::write_delim
肯定比 write.csv
快,但是 data.table 库有一个更快的函数,fwrite
,它允许以与 write.csv
write_quoted_csv <- function(object, path)
{
data.table::setDT(object)
data.table::fwrite(object, path, quote = TRUE)
data.table::setDF(object)
}
让我们使用包含 100,000 行的数据框针对 write_delim
进行测试:
df <- data.frame(a = 1:50000,
b = 50001:100000,
c = rep(LETTERS[1:10], each = 5000))
microbenchmark::microbenchmark(
readr = readr::write_delim(df, "~/test_readr.csv", delim = ",", na = ""),
data.table = write_quoted_csv(df, "~/test_datatable.csv"),
times = 100)
# Unit: milliseconds
# expr min lq mean median uq max neval
# readr 244.87593 257.6236 276.91877 262.86998 283.07285 416.79254 100
# data.table 20.80768 22.8940 26.25808 24.92915 27.69624 54.55789 100
您可以看到 data.table 方法快了 10 倍以上。即使那样,write_delim
也不会放入引号,而 fwrite
会:
cat(readLines("~/test_readr.csv", 10), sep = "\n")
#> a,b,c
#> 1,50001,A
#> 2,50002,A
#> 3,50003,A
#> 4,50004,A
#> 5,50005,A
#> 6,50006,A
#> 7,50007,A
#> 8,50008,A
#> 9,50009,A
cat(readLines("~/test_datatable.csv", 10), sep = "\n")
#> "a","b","c"
#> 1,50001,"A"
#> 2,50002,"A"
#> 3,50003,"A"
#> 4,50004,"A"
#> 5,50005,"A"
#> 6,50006,"A"
#> 7,50007,"A"
#> 8,50008,"A"
#> 9,50009,"A"
所以,通过超快的方法,你可以这样写你的s3文件:
s3write_using(file_filtered,
FUN = write_quoted_csv,
object = paste0(output_path, "file-", lubridate::today(), ".csv"),
bucket = input_bucket)