s3.save 一个 json 文件到 aws s3

s3.save a json file to aws s3

我正在尝试将格式正确的 json 文件保存到 aws s3。

我可以将常规数据帧保存到 s3,例如

library(tidyverse)
library(aws.s3)
s3save(mtcars, bucket = "s3://ourco-emr/", object = "tables/adhoc.db/mtcars/mtcars")

但我需要将 mtcars 转换为 json 格式。具体来说 ndjson.

我能够创建格式正确的 json 文件,例如:

predictions_file <- file("mtcars.json")
jsonlite::stream_out(mtcars), predictions_file)

这会在我的目录中保存一个名为 mtcars.json 的文件。

但是,使用 aws.s3 函数 s3save(),我需要发送内存中的对象,而不是文件。

尝试过:

predictions_file <- file("mtcars.json")
s3write_using(mtcars, 
              FUN = jsonlite::stream_out,
              con = predictions_file,
              "s3://ourco-emr/", 
              object = "tables/adhoc.db/mtcars/mtcars")

给出:

Error in if (verbose) message("opening ", is(con), " output connection.") : argument is not interpretable as logical

我尝试了相同的代码块,但省略了 con=predictions_file 的行,它给出了:

Argument con must be a connection.

如果函数 jsonlite::stream_out() 创建了一个格式正确的 json 文件,我该如何将该文件写入 s3?

编辑: 所需的 json 输出如下所示:

{"mpg":21,"cyl":6,"disp":160,"hp":110,"drat":3,"wt":2,"qsec":16,"vs":0,"am":1,"gear":4,"carb":4,"year":"2020","month":"03","day":"05"}
{"mpg":21,"cyl":6,"disp":160,"hp":110,"drat":3,"wt":2,"qsec":17,"vs":0,"am":1,"gear":4,"carb":4,"year":"2020","month":"03","day":"05"}
{"mpg":22,"cyl":4,"disp":108,"hp":93,"drat":35,"wt":2,"qsec":18,"vs":1,"am":1,"gear":4,"carb":1,"year":"2020","month":"03","day":"05"}
{"mpg":21,"cyl":6,"disp":258,"hp":110,"drat":8,"wt":3,"qsec":19,"vs":1,"am":0,"gear":3,"carb":1,"year":"2020","month":"03","day":"05"}
{"mpg":18,"cyl":8,"disp":360,"hp":175,"drat":3,"wt":3,"qsec":17,"vs":0,"am":0,"gear":3,"carb":2,"year":"2020","month":"03","day":"05"}

尝试使用 readchar 时:

mtcars_string <- readChar("mtcars.json", 1e6)
s3save(mtcars_string, bucket = "s3://ourco-emr/", object = "tables/adhoc.db/mtcars/2020/03/06/mtcars")

如果我随后下载并打开生成的 json 文件,它看起来像这样:

5244 5833 0a58 0a00 0000 0300 0306 0000
0305 0000 0000 0555 5446 2d38 0000 0402
0000 0001 0004 0009 0000 000d 6d74 6361
7273 5f73 7472 696e 6700 0000 1000 0000
0100 0400 0900 0012 347b 226d 7067 223a
3231 2c22 6379 6c22 3a36 2c22 6469 7370

所以看起来 tsb 已发送到 aws s3 而不是 json

我遇到了同样的问题。我需要编写 JSON 行 (ndjson) 并将其上传到 S3,据我所知,只有 jsonlite-package 中的 stream_out() 写入 JSON 行。

stream_out() 仅将连接对象作为目标,s3write_using(),但是,写入临时文件 tmp 并将该文件的路径作为字符串传递给 FUNstream_out() 然后抛出错误:

Argument con must be a connection.

暂时的解决方法是修改 s3write_using() 以将连接传递给 FUN 而不是文件路径字符串。

  1. trace(s3write_using, edit=TRUE) - 打开编辑器

  2. 更改第 5 行:
    value <- FUN(x, tmp, ...)

    为此:
    value <- FUN(x, file(tmp), ...)

然后您可以使用 stream_out():

上传数据
s3write_using(x = data, 
              FUN = stream_out,
              bucket = 'mybucket',
              object = 'my/object.json',
              opts = list(acl = "private", multipart = FALSE, verbose = T, show_progress = T))

编辑会保留整个会话或直到您untrace(s3write_using)

人们可能应该在他们的 cloudyr/aws.s3 GitHub 中提出请求,因为这是一个常见的用例。