从 json 取消嵌套多列
Unnest multiple columns from json
我想知道是否有更简单的解决方案来将一些 JSON 取消嵌套到数据框中。我有以下 JSON 来自 API:
library(tidyverse)
library(jsonlite)
json <- '{
"result": {
"id": "id_1",
"description": "description",
"var1": {
"var1Id": "a",
"var1Title": "aTitle"
},
"var2": {
"var1Id": "b",
"var2Title": "bTitle"
},
"var3": {
"var3Id": "c",
"var3Info": "c123",
"var3Type": "cType"
},
"var4": {
"var4Lvl2": [
{
"var4Id": "d",
"var4Title": "dTitle"
},
{
"var4Id": "d2",
"var4Title": "d2Title"
}
]
}
}
}'
接下来我通常将它变成小标题,然后开始对每个列表列使用 tidyr::unnest_wider
:
## Note I use bind_rows to simulate how my actual data looks
json2 <- json %>%
fromJSON() %>%
tibble() %>%
bind_rows(fromJSON(json) %>% tibble())
json2 %>%
unnest_wider(".") %>%
unnest_wider("var1", names_sep = "_") %>%
unnest_wider("var2", names_sep = "_") %>%
unnest_wider("var3", names_sep = "_") %>%
unnest_wider("var4", names_sep = "_") %>%
unnest_wider("var4_var4Lvl2") %>%
unnest_wider("var4Id", names_sep = "_") %>%
unnest_wider("var4Title", names_sep = "_")
上面的过程工作得很好,但我觉得有一种更简单的方法可以解除所有这些列的嵌套,而无需输入单独的列名。请注意,列数和列名可能会根据特定的 API 查询而变化,因此能够处理这些变化的解决方案会很棒。
最终找到。我做了一个函数来取消嵌套所有可以顺序用于嵌套列表列的每一层。
## create unnest_all function
unnest_all <- function(data){
list_cols <- names(select(data, where(is.list)))
data_non_list <- data %>%
select(!where(is.list))
if(length(list_cols) != 0){
map_dfc(list_cols, ~
data %>%
select(.x) %>%
unnest_wider(c(!!.x), names_sep= "_", names_repair = 'unique')) %>%
bind_cols(data_non_list, .)
} else {
data %>%
janitor::clean_names()
}
}
## use on json data
json %>%
fromJSON() %>%
tibble() %>%
bind_rows(fromJSON(json) %>% tibble()) %>%
unnest_wider(".") %>%
unnest_all() %>%
unnest_all() %>%
unnest_all()
我想知道是否有更简单的解决方案来将一些 JSON 取消嵌套到数据框中。我有以下 JSON 来自 API:
library(tidyverse)
library(jsonlite)
json <- '{
"result": {
"id": "id_1",
"description": "description",
"var1": {
"var1Id": "a",
"var1Title": "aTitle"
},
"var2": {
"var1Id": "b",
"var2Title": "bTitle"
},
"var3": {
"var3Id": "c",
"var3Info": "c123",
"var3Type": "cType"
},
"var4": {
"var4Lvl2": [
{
"var4Id": "d",
"var4Title": "dTitle"
},
{
"var4Id": "d2",
"var4Title": "d2Title"
}
]
}
}
}'
接下来我通常将它变成小标题,然后开始对每个列表列使用 tidyr::unnest_wider
:
## Note I use bind_rows to simulate how my actual data looks
json2 <- json %>%
fromJSON() %>%
tibble() %>%
bind_rows(fromJSON(json) %>% tibble())
json2 %>%
unnest_wider(".") %>%
unnest_wider("var1", names_sep = "_") %>%
unnest_wider("var2", names_sep = "_") %>%
unnest_wider("var3", names_sep = "_") %>%
unnest_wider("var4", names_sep = "_") %>%
unnest_wider("var4_var4Lvl2") %>%
unnest_wider("var4Id", names_sep = "_") %>%
unnest_wider("var4Title", names_sep = "_")
上面的过程工作得很好,但我觉得有一种更简单的方法可以解除所有这些列的嵌套,而无需输入单独的列名。请注意,列数和列名可能会根据特定的 API 查询而变化,因此能够处理这些变化的解决方案会很棒。
最终找到
## create unnest_all function
unnest_all <- function(data){
list_cols <- names(select(data, where(is.list)))
data_non_list <- data %>%
select(!where(is.list))
if(length(list_cols) != 0){
map_dfc(list_cols, ~
data %>%
select(.x) %>%
unnest_wider(c(!!.x), names_sep= "_", names_repair = 'unique')) %>%
bind_cols(data_non_list, .)
} else {
data %>%
janitor::clean_names()
}
}
## use on json data
json %>%
fromJSON() %>%
tibble() %>%
bind_rows(fromJSON(json) %>% tibble()) %>%
unnest_wider(".") %>%
unnest_all() %>%
unnest_all() %>%
unnest_all()