R 将数据框转换为按列名分组的嵌套 json file/object
R convert dataframe to a nested json file/object grouped by column names
我想将数据框转换为嵌套 json 对象,并根据列名确定在何处创建嵌套 json 对象。
我做了一个玩具例子来解释这个问题。鉴于此数据框:
df <- read.csv(textConnection(
"id,name,allergies.pollen,allergies.pet,attributes.height,attributes.gender
x,alice,no,yes,175,female
y,bob,yes,yes,180,male"))
或者更易读的格式:
id name allergies.pollen allergies.pet attributes.height attributes.gender
1 x alice no yes 175 female
2 y bob yes yes 180 male
那么我想要下面的json对象:
'[
{
"id": "x",
"name": "alice",
"allergies":
{
"pollen": "no",
"pet": "yes"
},
"attributes":
{
"height": "175",
"gender": "female"
}
},
{
"id": "y",
"name": "bob",
"allergies":
{
"pollen": "yes",
"pet": "yes"
},
"attributes":
{
"height": "180",
"gender": "male"
}
}
]'
因此它应该自动将列分组为固定分隔符“.”。
理想情况下,它也应该能够处理嵌套对象,例如allergies.pet.cat
和 allergies.pet.dog
.
我解决这个问题的最佳想法是制作一个函数,该函数递归调用 jsonlite::toJSON
并使用 stringr::str_extract("^[^.]*")
提取类别,但我无法完成这项工作。
这是一个似乎有效的函数。唯一的问题是是否存在可能的碰撞,例如 allergies.pet
和 allergies.pet.car
;虽然它没有错误,但它可能是非标准的。
新数据:
df <- read.csv(textConnection(
"id,name,allergies.pollen,allergies.pet,attributes.height,attributes.gender,allergies.pet.cat
x,alice,no,yes,175,female,quux
y,bob,yes,yes,180,male,unk"))
函数:
func <- function(x) {
grps <- split(names(x), gsub("[.].*", "", names(x)))
for (nm in names(grps)) {
if (length(grps[[nm]]) > 1 || !nm %in% grps[[nm]]) {
x[[nm]] <- setNames(subset(x, select = grps[[nm]]),
gsub("^[^.]+[.]", "", grps[[nm]]))
x[,setdiff(grps[[nm]], nm)] <- NULL
}
}
for (nm in names(x)) {
if (is.data.frame(x[[nm]])) {
x[[nm]] <- func(x[[nm]])
}
}
if (any(grepl("[.]", names(x)))) func(x) else x
}
看看这如何将所有 .
分隔的列嵌套到框架中:
str(df)
# 'data.frame': 2 obs. of 7 variables:
# $ id : chr "x" "y"
# $ name : chr "alice" "bob"
# $ allergies.pollen : chr "no" "yes"
# $ allergies.pet : chr "yes" "yes"
# $ attributes.height: int 175 180
# $ attributes.gender: chr "female" "male"
# $ allergies.pet.cat: chr "quux" "unk"
newdf <- func(df)
str(newdf)
# 'data.frame': 2 obs. of 4 variables:
# $ id : chr "x" "y"
# $ name : chr "alice" "bob"
# $ allergies :'data.frame': 2 obs. of 2 variables:
# ..$ pollen: chr "no" "yes"
# ..$ pet :'data.frame': 2 obs. of 2 variables:
# .. ..$ pet: chr "yes" "yes"
# .. ..$ cat: chr "quux" "unk"
# $ attributes:'data.frame': 2 obs. of 2 variables:
# ..$ height: int 175 180
# ..$ gender: chr "female" "male"
从这里开始,直接进行 jsonify:
jsonlite::toJSON(newdf, pretty = TRUE)
# [
# {
# "id": "x",
# "name": "alice",
# "allergies": {
# "pollen": "no",
# "pet": {
# "pet": "yes",
# "cat": "quux"
# }
# },
# "attributes": {
# "height": 175,
# "gender": "female"
# }
# },
# {
# "id": "y",
# "name": "bob",
# "allergies": {
# "pollen": "yes",
# "pet": {
# "pet": "yes",
# "cat": "unk"
# }
# },
# "attributes": {
# "height": 180,
# "gender": "male"
# }
# }
# ]
我想将数据框转换为嵌套 json 对象,并根据列名确定在何处创建嵌套 json 对象。
我做了一个玩具例子来解释这个问题。鉴于此数据框:
df <- read.csv(textConnection(
"id,name,allergies.pollen,allergies.pet,attributes.height,attributes.gender
x,alice,no,yes,175,female
y,bob,yes,yes,180,male"))
或者更易读的格式:
id name allergies.pollen allergies.pet attributes.height attributes.gender
1 x alice no yes 175 female
2 y bob yes yes 180 male
那么我想要下面的json对象:
'[
{
"id": "x",
"name": "alice",
"allergies":
{
"pollen": "no",
"pet": "yes"
},
"attributes":
{
"height": "175",
"gender": "female"
}
},
{
"id": "y",
"name": "bob",
"allergies":
{
"pollen": "yes",
"pet": "yes"
},
"attributes":
{
"height": "180",
"gender": "male"
}
}
]'
因此它应该自动将列分组为固定分隔符“.”。
理想情况下,它也应该能够处理嵌套对象,例如allergies.pet.cat
和 allergies.pet.dog
.
我解决这个问题的最佳想法是制作一个函数,该函数递归调用 jsonlite::toJSON
并使用 stringr::str_extract("^[^.]*")
提取类别,但我无法完成这项工作。
这是一个似乎有效的函数。唯一的问题是是否存在可能的碰撞,例如 allergies.pet
和 allergies.pet.car
;虽然它没有错误,但它可能是非标准的。
新数据:
df <- read.csv(textConnection(
"id,name,allergies.pollen,allergies.pet,attributes.height,attributes.gender,allergies.pet.cat
x,alice,no,yes,175,female,quux
y,bob,yes,yes,180,male,unk"))
函数:
func <- function(x) {
grps <- split(names(x), gsub("[.].*", "", names(x)))
for (nm in names(grps)) {
if (length(grps[[nm]]) > 1 || !nm %in% grps[[nm]]) {
x[[nm]] <- setNames(subset(x, select = grps[[nm]]),
gsub("^[^.]+[.]", "", grps[[nm]]))
x[,setdiff(grps[[nm]], nm)] <- NULL
}
}
for (nm in names(x)) {
if (is.data.frame(x[[nm]])) {
x[[nm]] <- func(x[[nm]])
}
}
if (any(grepl("[.]", names(x)))) func(x) else x
}
看看这如何将所有 .
分隔的列嵌套到框架中:
str(df)
# 'data.frame': 2 obs. of 7 variables:
# $ id : chr "x" "y"
# $ name : chr "alice" "bob"
# $ allergies.pollen : chr "no" "yes"
# $ allergies.pet : chr "yes" "yes"
# $ attributes.height: int 175 180
# $ attributes.gender: chr "female" "male"
# $ allergies.pet.cat: chr "quux" "unk"
newdf <- func(df)
str(newdf)
# 'data.frame': 2 obs. of 4 variables:
# $ id : chr "x" "y"
# $ name : chr "alice" "bob"
# $ allergies :'data.frame': 2 obs. of 2 variables:
# ..$ pollen: chr "no" "yes"
# ..$ pet :'data.frame': 2 obs. of 2 variables:
# .. ..$ pet: chr "yes" "yes"
# .. ..$ cat: chr "quux" "unk"
# $ attributes:'data.frame': 2 obs. of 2 variables:
# ..$ height: int 175 180
# ..$ gender: chr "female" "male"
从这里开始,直接进行 jsonify:
jsonlite::toJSON(newdf, pretty = TRUE)
# [
# {
# "id": "x",
# "name": "alice",
# "allergies": {
# "pollen": "no",
# "pet": {
# "pet": "yes",
# "cat": "quux"
# }
# },
# "attributes": {
# "height": 175,
# "gender": "female"
# }
# },
# {
# "id": "y",
# "name": "bob",
# "allergies": {
# "pollen": "yes",
# "pet": {
# "pet": "yes",
# "cat": "unk"
# }
# },
# "attributes": {
# "height": 180,
# "gender": "male"
# }
# }
# ]