如何将 JSON 输出提取到数据帧?
How to extract JSON output to dataframe?
我有一个包含 3000 多条记录的数据框,其中包括每次观测的纬度和经度坐标。我想从每组坐标中获取国家和州或省。
我似乎有部分解决方案,但我是 R 的新手,不了解如何将信息从 JSON 输出中提取到可以绑定到原始数据集的数据框中。
如何将 fromJSON 创建的嵌套列表解析为 data.frame? 具体来说,我希望新数据框看起来像:
纬度、经度、国家/地区、州(列名称)
或者,对于我获取空间信息的问题的更好解决方案表示赞赏!
这是我的代码:
library(RDSTK)
library(httr)
library(rjson)
Coords <- structure(list(Latitude = c(43.30528, 46.08333, 32.58333, 46.25833, 45.75, 46.25, 45.58333, 45.58333, 44.08333, 45.75),
Lontitude = c(-79.80306, -82.41667, -117.08333, -123.975, -85.75, -123.91667, -86.75, -86.75, -76.58333, -85.25
)), .Names = c("Latitude", "Longitude"), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,9L, 10L), class = "data.frame")
json_file <- fromJSON(coordinates2politics(Coords$Latitude, Coords$Longitude))
我更喜欢使用 jsonlite
在 R 中解析 JSON。
要解析嵌套的 JSON 列表,您可以在 lapply
.
中执行 fromJSON
调用
jsonlite::fromJSON
试图为您简化结果。但是,由于 JSON 旨在处理嵌套数据结构,您通常会返回一个 data.frame 列表,因此要获得您想要的 data.frame
,您需要知道哪个元素的列表,然后提取它。
例如
library(RDSTK)
library(jsonlite)
js <- coordinates2politics(Coords$Latitude, Coords$Longitude)
lst <- lapply(js, jsonlite::fromJSON)
lst[[1]]$politics
# type friendly_type name code
# 1 admin2 country Canada can
# 2 admin4 state Ontario ca08
# 3 constituency constituency Hamilton Centre 35031
# 4 constituency constituency Burlington 35010
# 5 constituency constituency Hamilton East-Stoney Creek 35032
要得到data.frame
,你可以构造另一个lapply
来提取你想要的元素,然后把它和一个do.call(..., rbind)
放在一起,或者我的偏好是data.table::rbindlist(...)
lst_result <- lapply(lst, function(x){
df <- x$politics[[1]]
df$lat <- x$location$latitude
df$lon <- x$location$longitude
return(df)
})
data.table::rbindlist(lst_result)
# type friendly_type name code lat lon
# 1: admin2 country Canada can 43.30528 -79.80306
# 2: admin4 state Ontario ca08 43.30528 -79.80306
# 3: constituency constituency Hamilton Centre 35031 43.30528 -79.80306
# 4: constituency constituency Burlington 35010 43.30528 -79.80306
# 5: constituency constituency Hamilton East-Stoney Creek 35032 43.30528 -79.80306
# 6: admin2 country Canada can 46.08333 -82.41667
# 7: admin4 state Ontario ca08 46.08333 -82.41667
或者,要获得关于每个 lat/lon 的更多详细信息,您可以使用 Google 的 API 到 library(googleway)
(免责声明:我写了 googleway)来反向地理编码lat/lons.
为此,您需要一个有效的 Google API 密钥(除非您付费,否则每天只能请求 2,500 个)
library(googleway)
key <- "your_api_key"
lst <- apply(Coords, 1, function(x){
google_reverse_geocode(location = c(x["Latitude"], x["Longitude"]),
key = key)
})
lst[[1]]$results$address_components
# [[1]]
# long_name short_name types
# 1 Burlington Bay James N. Allan Skyway Burlington Bay James N. Allan Skyway route
# 2 Burlington Burlington locality, political
# 3 Halton Regional Municipality Halton Regional Municipality administrative_area_level_2, political
# 4 Ontario ON administrative_area_level_1, political
# 5 Canada CA country, political
# 6 L7S L7S postal_code, postal_code_prefix
或类似地通过 library(ggmap)
,也受限于 Google 的 2,500 限制。
library(ggmap)
apply(Coords, 1, function(x){
revgeocode(c(x["Longitude"], x["Latitude"]))
})
# 1
# "Burlington Bay James N. Allan Skyway, Burlington, ON L7S, Canada"
# 2
# "308 Brennan Harbour Rd, Spanish, ON P0P 2A0, Canada"
# 3
# "724 Harris Ave, San Diego, CA 92154, USA"
# 4
# "30 Cherry St, Chinook, WA 98614, USA"
# 5
# "St James Township, MI, USA"
# 6
# "US-101, Chinook, WA 98614, USA"
# 7
# "2413 II Rd, Garden, MI 49835, USA"
# 8
# "2413 II Rd, Garden, MI 49835, USA"
# 9
# "8925 S Shore Rd, Stella, ON K0H 2S0, Canada"
# 10
# "Charlevoix County, MI, USA"
json-list 需要提取。您实际上只有第一个坐标的结果:
sapply(json_file[[1]]$politics, "[[", 'name')[ # now pick correct names with logical
sapply(json_file[[1]]$politics, "[[", 'friendly_type') %in% c("country","state") ]
[1] "Canada" "Ontario"
您应该使用 apply
到 运行 通过 fromJSON(coordinates2politics( .,.)
提取的所有坐标 one-by-one,因为函数似乎不是 "vectorized"。
res=apply( Coords, 1, function(x) {fromJSON(coordinates2politics(x['Latitude'],
x['Longitude']) )} )
sapply( res, function(x) sapply(x[[1]]$politics, "[[", 'name')[
sapply(x[[1]]$politics, "[[", 'friendly_type') %in%
c("country","state")] )
$`1`
[1] "Canada" "Ontario"
$`2`
[1] "Canada" "Ontario"
$`3`
[1] "United States" "California" "Mexico" "California"
$`4`
[1] "United States"
$`5`
[1] "United States" "Michigan"
$`6`
[1] "United States" "Washington"
$`7`
[1] "United States" "Michigan"
$`8`
[1] "United States" "Michigan"
$`9`
[1] "Canada" "Ontario"
$`10`
[1] "United States" "Michigan"
显然边界附近的项目(如圣地亚哥县或丘拉维斯塔)会给出模棱两可的结果。
我有一个包含 3000 多条记录的数据框,其中包括每次观测的纬度和经度坐标。我想从每组坐标中获取国家和州或省。
我似乎有部分解决方案,但我是 R 的新手,不了解如何将信息从 JSON 输出中提取到可以绑定到原始数据集的数据框中。
如何将 fromJSON 创建的嵌套列表解析为 data.frame? 具体来说,我希望新数据框看起来像:
纬度、经度、国家/地区、州(列名称)
或者,对于我获取空间信息的问题的更好解决方案表示赞赏!
这是我的代码:
library(RDSTK)
library(httr)
library(rjson)
Coords <- structure(list(Latitude = c(43.30528, 46.08333, 32.58333, 46.25833, 45.75, 46.25, 45.58333, 45.58333, 44.08333, 45.75),
Lontitude = c(-79.80306, -82.41667, -117.08333, -123.975, -85.75, -123.91667, -86.75, -86.75, -76.58333, -85.25
)), .Names = c("Latitude", "Longitude"), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,9L, 10L), class = "data.frame")
json_file <- fromJSON(coordinates2politics(Coords$Latitude, Coords$Longitude))
我更喜欢使用 jsonlite
在 R 中解析 JSON。
要解析嵌套的 JSON 列表,您可以在 lapply
.
fromJSON
调用
jsonlite::fromJSON
试图为您简化结果。但是,由于 JSON 旨在处理嵌套数据结构,您通常会返回一个 data.frame 列表,因此要获得您想要的 data.frame
,您需要知道哪个元素的列表,然后提取它。
例如
library(RDSTK)
library(jsonlite)
js <- coordinates2politics(Coords$Latitude, Coords$Longitude)
lst <- lapply(js, jsonlite::fromJSON)
lst[[1]]$politics
# type friendly_type name code
# 1 admin2 country Canada can
# 2 admin4 state Ontario ca08
# 3 constituency constituency Hamilton Centre 35031
# 4 constituency constituency Burlington 35010
# 5 constituency constituency Hamilton East-Stoney Creek 35032
要得到data.frame
,你可以构造另一个lapply
来提取你想要的元素,然后把它和一个do.call(..., rbind)
放在一起,或者我的偏好是data.table::rbindlist(...)
lst_result <- lapply(lst, function(x){
df <- x$politics[[1]]
df$lat <- x$location$latitude
df$lon <- x$location$longitude
return(df)
})
data.table::rbindlist(lst_result)
# type friendly_type name code lat lon
# 1: admin2 country Canada can 43.30528 -79.80306
# 2: admin4 state Ontario ca08 43.30528 -79.80306
# 3: constituency constituency Hamilton Centre 35031 43.30528 -79.80306
# 4: constituency constituency Burlington 35010 43.30528 -79.80306
# 5: constituency constituency Hamilton East-Stoney Creek 35032 43.30528 -79.80306
# 6: admin2 country Canada can 46.08333 -82.41667
# 7: admin4 state Ontario ca08 46.08333 -82.41667
或者,要获得关于每个 lat/lon 的更多详细信息,您可以使用 Google 的 API 到 library(googleway)
(免责声明:我写了 googleway)来反向地理编码lat/lons.
为此,您需要一个有效的 Google API 密钥(除非您付费,否则每天只能请求 2,500 个)
library(googleway)
key <- "your_api_key"
lst <- apply(Coords, 1, function(x){
google_reverse_geocode(location = c(x["Latitude"], x["Longitude"]),
key = key)
})
lst[[1]]$results$address_components
# [[1]]
# long_name short_name types
# 1 Burlington Bay James N. Allan Skyway Burlington Bay James N. Allan Skyway route
# 2 Burlington Burlington locality, political
# 3 Halton Regional Municipality Halton Regional Municipality administrative_area_level_2, political
# 4 Ontario ON administrative_area_level_1, political
# 5 Canada CA country, political
# 6 L7S L7S postal_code, postal_code_prefix
或类似地通过 library(ggmap)
,也受限于 Google 的 2,500 限制。
library(ggmap)
apply(Coords, 1, function(x){
revgeocode(c(x["Longitude"], x["Latitude"]))
})
# 1
# "Burlington Bay James N. Allan Skyway, Burlington, ON L7S, Canada"
# 2
# "308 Brennan Harbour Rd, Spanish, ON P0P 2A0, Canada"
# 3
# "724 Harris Ave, San Diego, CA 92154, USA"
# 4
# "30 Cherry St, Chinook, WA 98614, USA"
# 5
# "St James Township, MI, USA"
# 6
# "US-101, Chinook, WA 98614, USA"
# 7
# "2413 II Rd, Garden, MI 49835, USA"
# 8
# "2413 II Rd, Garden, MI 49835, USA"
# 9
# "8925 S Shore Rd, Stella, ON K0H 2S0, Canada"
# 10
# "Charlevoix County, MI, USA"
json-list 需要提取。您实际上只有第一个坐标的结果:
sapply(json_file[[1]]$politics, "[[", 'name')[ # now pick correct names with logical
sapply(json_file[[1]]$politics, "[[", 'friendly_type') %in% c("country","state") ]
[1] "Canada" "Ontario"
您应该使用 apply
到 运行 通过 fromJSON(coordinates2politics( .,.)
提取的所有坐标 one-by-one,因为函数似乎不是 "vectorized"。
res=apply( Coords, 1, function(x) {fromJSON(coordinates2politics(x['Latitude'],
x['Longitude']) )} )
sapply( res, function(x) sapply(x[[1]]$politics, "[[", 'name')[
sapply(x[[1]]$politics, "[[", 'friendly_type') %in%
c("country","state")] )
$`1`
[1] "Canada" "Ontario"
$`2`
[1] "Canada" "Ontario"
$`3`
[1] "United States" "California" "Mexico" "California"
$`4`
[1] "United States"
$`5`
[1] "United States" "Michigan"
$`6`
[1] "United States" "Washington"
$`7`
[1] "United States" "Michigan"
$`8`
[1] "United States" "Michigan"
$`9`
[1] "Canada" "Ontario"
$`10`
[1] "United States" "Michigan"
显然边界附近的项目(如圣地亚哥县或丘拉维斯塔)会给出模棱两可的结果。