如何将 JSON 输出提取到数据帧?

How to extract JSON output to dataframe?

我有一个包含 3000 多条记录的数据框,其中包括每次观测的纬度和经度坐标。我想从每组坐标中获取国家和州或省。

我似乎有部分解决方案,但我是 R 的新手,不了解如何将信息从 JSON 输出中提取到可以绑定到原始数​​据集的数据框中。

如何将 fromJSON 创建的嵌套列表解析为 data.frame? 具体来说,我希望新数据框看起来像:

纬度、经度、国家/地区、州(列名称)

或者,对于我获取空间信息的问题的更好解决方案表示赞赏!

这是我的代码:

library(RDSTK)
library(httr)
library(rjson)
Coords <- structure(list(Latitude = c(43.30528, 46.08333, 32.58333, 46.25833, 45.75, 46.25, 45.58333, 45.58333, 44.08333, 45.75), 
                         Lontitude = c(-79.80306, -82.41667, -117.08333, -123.975, -85.75, -123.91667, -86.75, -86.75, -76.58333, -85.25
                                         )), .Names = c("Latitude", "Longitude"), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,9L, 10L), class = "data.frame")

json_file <- fromJSON(coordinates2politics(Coords$Latitude, Coords$Longitude))

我更喜欢使用 jsonlite 在 R 中解析 JSON。

要解析嵌套的 JSON 列表,您可以在 lapply.

中执行 fromJSON 调用

jsonlite::fromJSON 试图为您简化结果。但是,由于 JSON 旨在处理嵌套数据结构,您通常会返回一个 data.frame 列表,因此要获得您想要的 data.frame,您需要知道哪个元素的列表,然后提取它。

例如

library(RDSTK)
library(jsonlite)

js <- coordinates2politics(Coords$Latitude, Coords$Longitude)
lst <- lapply(js, jsonlite::fromJSON)

lst[[1]]$politics
#           type friendly_type                       name  code
# 1       admin2       country                     Canada   can
# 2       admin4         state                    Ontario  ca08
# 3 constituency  constituency            Hamilton Centre 35031
# 4 constituency  constituency                 Burlington 35010
# 5 constituency  constituency Hamilton East-Stoney Creek 35032

要得到data.frame,你可以构造另一个lapply来提取你想要的元素,然后把它和一个do.call(..., rbind)放在一起,或者我的偏好是data.table::rbindlist(...)

lst_result <- lapply(lst, function(x){
    df <- x$politics[[1]]
    df$lat <- x$location$latitude
    df$lon <- x$location$longitude
    return(df)
})

data.table::rbindlist(lst_result)

#            type friendly_type                                  name                   code      lat        lon
# 1:       admin2       country                                Canada                    can 43.30528  -79.80306
# 2:       admin4         state                               Ontario                   ca08 43.30528  -79.80306
# 3: constituency  constituency                       Hamilton Centre                  35031 43.30528  -79.80306
# 4: constituency  constituency                            Burlington                  35010 43.30528  -79.80306
# 5: constituency  constituency            Hamilton East-Stoney Creek                  35032 43.30528  -79.80306
# 6:       admin2       country                                Canada                    can 46.08333  -82.41667
# 7:       admin4         state                               Ontario                   ca08 46.08333  -82.41667

或者,要获得关于每个 lat/lon 的更多详细信息,您可以使用 Google 的 API 到 library(googleway)(免责声明:我写了 googleway)来反向地理编码lat/lons.

为此,您需要一个有效的 Google API 密钥(除非您付费,否则每天只能请求 2,500 个)

library(googleway)

key <- "your_api_key"

lst <- apply(Coords, 1, function(x){
    google_reverse_geocode(location = c(x["Latitude"], x["Longitude"]),
                           key = key)
})

lst[[1]]$results$address_components
# [[1]]
#                              long_name                           short_name                                  types
# 1 Burlington Bay James N. Allan Skyway Burlington Bay James N. Allan Skyway                                  route
# 2                           Burlington                           Burlington                    locality, political
# 3         Halton Regional Municipality         Halton Regional Municipality administrative_area_level_2, political
# 4                              Ontario                                   ON administrative_area_level_1, political
# 5                               Canada                                   CA                     country, political
# 6                                  L7S                                  L7S        postal_code, postal_code_prefix

或类似地通过 library(ggmap),也受限于 Google 的 2,500 限制。

library(ggmap)

apply(Coords, 1, function(x){
    revgeocode(c(x["Longitude"], x["Latitude"]))
})

# 1 
# "Burlington Bay James N. Allan Skyway, Burlington, ON L7S, Canada" 
# 2 
# "308 Brennan Harbour Rd, Spanish, ON P0P 2A0, Canada" 
# 3 
# "724 Harris Ave, San Diego, CA 92154, USA" 
# 4 
# "30 Cherry St, Chinook, WA 98614, USA" 
# 5 
# "St James Township, MI, USA" 
# 6 
# "US-101, Chinook, WA 98614, USA" 
# 7 
# "2413 II Rd, Garden, MI 49835, USA" 
# 8 
# "2413 II Rd, Garden, MI 49835, USA" 
# 9 
# "8925 S Shore Rd, Stella, ON K0H 2S0, Canada" 
# 10 
# "Charlevoix County, MI, USA"

json-list 需要提取。您实际上只有第一个坐标的结果:

sapply(json_file[[1]]$politics, "[[", 'name')[ # now pick correct names with logical
        sapply(json_file[[1]]$politics, "[[", 'friendly_type') %in% c("country","state") ] 
[1] "Canada"  "Ontario"

您应该使用 apply 到 运行 通过 fromJSON(coordinates2politics( .,.) 提取的所有坐标 one-by-one,因为函数似乎不是 "vectorized"。

res=apply( Coords, 1, function(x) {fromJSON(coordinates2politics(x['Latitude'], 
                                                                 x['Longitude']) )} )
sapply( res, function(x) sapply(x[[1]]$politics, "[[", 'name')[
                             sapply(x[[1]]$politics, "[[", 'friendly_type') %in% 
                                                                c("country","state")] )
$`1`
[1] "Canada"  "Ontario"

$`2`
[1] "Canada"  "Ontario"

$`3`
[1] "United States" "California"    "Mexico"        "California"   

$`4`
[1] "United States"

$`5`
[1] "United States" "Michigan"     

$`6`
[1] "United States" "Washington"   

$`7`
[1] "United States" "Michigan"     

$`8`
[1] "United States" "Michigan"     

$`9`
[1] "Canada"  "Ontario"

$`10`
[1] "United States" "Michigan" 

显然边界附近的项目(如圣地亚哥县或丘拉维斯塔)会给出模棱两可的结果。