如何下载 and/or 提取存储在 R 中响应对象内的 'raw' 二进制 zip 对象中的数据?
How to download and/or extract data stored in a 'raw' binary zip object within a response object in R?
我无法使用 httr 包从 API 请求下载或读取 zip 文件。是否有另一个我可以尝试的包允许我 download/read 二进制 zip 文件存储在 R 中的获取请求的响应中?
我尝试了两种方法:
使用GET得到一个application/json类型的响应对象(成功)然后使用fromJSON提取内容使用content(my_response,'text')。输出包括一个名为 'zip' 的列,这是我有兴趣下载的数据,其文档说明是一个 base64 编码的二进制文件。此列目前是一串非常长的随机字母,我不确定如何将其转换为实际数据集。
我尝试使用 fromJSON 绕过,因为我注意到响应对象本身有一个 class 'raw' 字段。这个对象是一个随机数列表,我怀疑它是数据集的二进制表示。我尝试使用 rawToChar(my_response$content) 尝试将原始数据类型转换为字符,但这会导致生成与 #1.
中相同的长字符串
- 我注意到使用方法 #1,如果我使用 base64_dec() 尝试转换长字符串,我也会得到与响应中的 'raw' 字段相同类型的输出对象本身。
getzip1 <- GET(getzip1_link)
getzip1 # successful response, status 200
df <- fromJSON(content(getzip1, "text"))
df$status # "OK"
df$dataset$zip # <- this is the very long string of letters (eg. "I1NC5qc29uUEsBAhQDFA...")
# Method 1: try to convert from the 'zip' object in the output of fromJSON
try1 <- base64_dec(df$dataset$zip)
#looks similar to getzip1$content (i.e. this produces the list of numbers/letters 50 4b 03 04 14 00, etc, perhaps binary representation)
# Method 2: try to get data directly from raw object
class(getzip1$content) # <- 'raw' class object directly from GET request
try2 <- rawToChar(getzip1$content) #returns same output as df$data$zip
我应该能够使用我的响应中的原始 'content' 对象或 fromJSON 输出的 'zip' 对象中的长字符串来查看数据集或以某种方式下载它。我不知道该怎么做。请帮忙!
欢迎!
基于 documentation API 对 getDataset
端点的响应具有模式
Dataset archive including meta information, the dataset itself is base64 encoded to allow for binary ZIP
transfers.
{
"status": "OK",
"dataset": {
"state_id": 5,
"session_id": 1624,
"session_name": "2019-2020 Regular Session",
"dataset_hash": "1c7d77fe298a4d30ad763733ab2f8c84",
"dataset_date": "2018-12-23",
"dataset_size": 317775,
"mime": "application\/zip",
"zip": "MIME 64 Encoded Document"
}
}
我们可以通过以下代码使用R获取数据,
library(httr)
library(jsonlite)
library(stringr)
library(maditr)
token <- "" # Your API key
session_id <- 1253L # Obtained from the getDatasetList endpoint
access_key <- "2qAtLbkQiJed9Z0FxyRblu" # Obtained from the getDatasetList endpoint
destfile <- file.path("path", "to", "file.zip") # Modify
response <- str_c("https://api.legiscan.com/?key=",
token,
"&op=getDataset&id=",
session_id,
"&access_key=",
access_key) %>%
GET()
status_code(x = response) == 200 # Good
body <- content(x = response,
as = "text",
encoding = "utf8") %>%
fromJSON() # This contains some extra metadata
content(x = response,
as = "text",
encoding = "utf8") %>%
fromJSON() %>%
getElement(name = "dataset") %>%
getElement(name = "zip") %>%
base64_dec() %>%
writeBin(con = destfile)
unzip(zipfile = destfile)
unzip
将解压缩文件,在本例中将类似于
hash.md5 # Can be checked against the metadata
AL/2016-2016_1st_Special_Session/bill/*.json
AL/2016-2016_1st_Special_Session/people/*.json
AL/2016-2016_1st_Special_Session/vote/*.json
一如既往,将您的代码包装在函数和利润中。
PS:下面是代码在 Julia 中的样子,作为比较。
using Base64, HTTP, JSON3, CodecZlib
token = "" # Your API key
session_id = 1253 # Obtained from the getDatasetList endpoint
access_key = "2qAtLbkQiJed9Z0FxyRblu" # Obtained from the getDatasetList endpoint
destfile = joinpath("path", "to", "file.zip") # Modify
response = string("https://api.legiscan.com/?",
join(["key=$token",
"op=getDataset",
"id=$session_id",
"access_key=$access_key"],
"&")) |>
HTTP.get
@assert response.status == 200
JSON3.read(response.body) |>
(content -> content.dataset.zip) |>
base64decode |>
(data -> write(destfile, data))
run(pipeline(`unzip`, destfile))
我无法使用 httr 包从 API 请求下载或读取 zip 文件。是否有另一个我可以尝试的包允许我 download/read 二进制 zip 文件存储在 R 中的获取请求的响应中?
我尝试了两种方法:
使用GET得到一个application/json类型的响应对象(成功)然后使用fromJSON提取内容使用content(my_response,'text')。输出包括一个名为 'zip' 的列,这是我有兴趣下载的数据,其文档说明是一个 base64 编码的二进制文件。此列目前是一串非常长的随机字母,我不确定如何将其转换为实际数据集。
我尝试使用 fromJSON 绕过,因为我注意到响应对象本身有一个 class 'raw' 字段。这个对象是一个随机数列表,我怀疑它是数据集的二进制表示。我尝试使用 rawToChar(my_response$content) 尝试将原始数据类型转换为字符,但这会导致生成与 #1.
中相同的长字符串
- 我注意到使用方法 #1,如果我使用 base64_dec() 尝试转换长字符串,我也会得到与响应中的 'raw' 字段相同类型的输出对象本身。
getzip1 <- GET(getzip1_link)
getzip1 # successful response, status 200
df <- fromJSON(content(getzip1, "text"))
df$status # "OK"
df$dataset$zip # <- this is the very long string of letters (eg. "I1NC5qc29uUEsBAhQDFA...")
# Method 1: try to convert from the 'zip' object in the output of fromJSON
try1 <- base64_dec(df$dataset$zip)
#looks similar to getzip1$content (i.e. this produces the list of numbers/letters 50 4b 03 04 14 00, etc, perhaps binary representation)
# Method 2: try to get data directly from raw object
class(getzip1$content) # <- 'raw' class object directly from GET request
try2 <- rawToChar(getzip1$content) #returns same output as df$data$zip
我应该能够使用我的响应中的原始 'content' 对象或 fromJSON 输出的 'zip' 对象中的长字符串来查看数据集或以某种方式下载它。我不知道该怎么做。请帮忙!
欢迎!
基于 documentation API 对 getDataset
端点的响应具有模式
Dataset archive including meta information, the dataset itself is base64 encoded to allow for binary ZIP transfers.
{
"status": "OK",
"dataset": {
"state_id": 5,
"session_id": 1624,
"session_name": "2019-2020 Regular Session",
"dataset_hash": "1c7d77fe298a4d30ad763733ab2f8c84",
"dataset_date": "2018-12-23",
"dataset_size": 317775,
"mime": "application\/zip",
"zip": "MIME 64 Encoded Document"
}
}
我们可以通过以下代码使用R获取数据,
library(httr)
library(jsonlite)
library(stringr)
library(maditr)
token <- "" # Your API key
session_id <- 1253L # Obtained from the getDatasetList endpoint
access_key <- "2qAtLbkQiJed9Z0FxyRblu" # Obtained from the getDatasetList endpoint
destfile <- file.path("path", "to", "file.zip") # Modify
response <- str_c("https://api.legiscan.com/?key=",
token,
"&op=getDataset&id=",
session_id,
"&access_key=",
access_key) %>%
GET()
status_code(x = response) == 200 # Good
body <- content(x = response,
as = "text",
encoding = "utf8") %>%
fromJSON() # This contains some extra metadata
content(x = response,
as = "text",
encoding = "utf8") %>%
fromJSON() %>%
getElement(name = "dataset") %>%
getElement(name = "zip") %>%
base64_dec() %>%
writeBin(con = destfile)
unzip(zipfile = destfile)
unzip
将解压缩文件,在本例中将类似于
hash.md5 # Can be checked against the metadata
AL/2016-2016_1st_Special_Session/bill/*.json
AL/2016-2016_1st_Special_Session/people/*.json
AL/2016-2016_1st_Special_Session/vote/*.json
一如既往,将您的代码包装在函数和利润中。
PS:下面是代码在 Julia 中的样子,作为比较。
using Base64, HTTP, JSON3, CodecZlib
token = "" # Your API key
session_id = 1253 # Obtained from the getDatasetList endpoint
access_key = "2qAtLbkQiJed9Z0FxyRblu" # Obtained from the getDatasetList endpoint
destfile = joinpath("path", "to", "file.zip") # Modify
response = string("https://api.legiscan.com/?",
join(["key=$token",
"op=getDataset",
"id=$session_id",
"access_key=$access_key"],
"&")) |>
HTTP.get
@assert response.status == 200
JSON3.read(response.body) |>
(content -> content.dataset.zip) |>
base64decode |>
(data -> write(destfile, data))
run(pipeline(`unzip`, destfile))