R 将数据框转换为按列名分组的嵌套 json file/object

Question

我想将数据框转换为嵌套 json 对象，并根据列名确定在何处创建嵌套 json 对象。

我做了一个玩具例子来解释这个问题。鉴于此数据框：

df <- read.csv(textConnection(
"id,name,allergies.pollen,allergies.pet,attributes.height,attributes.gender
x,alice,no,yes,175,female
y,bob,yes,yes,180,male"))

或者更易读的格式：

    id  name allergies.pollen allergies.pet attributes.height attributes.gender
  1  x alice               no           yes               175            female
  2  y   bob              yes           yes               180              male

那么我想要下面的json对象：

'[
  {
    "id": "x",
    "name": "alice",
    "allergies":
    {
      "pollen": "no",
      "pet": "yes"
    },
    "attributes": 
    {
      "height": "175",
      "gender": "female"
    }
  },
  {
    "id": "y",
    "name": "bob",
    "allergies":
    {
      "pollen": "yes",
      "pet": "yes"
    },
    "attributes":
    {
      "height": "180",
      "gender": "male"
    }
  }
]'

因此它应该自动将列分组为固定分隔符“.”。

理想情况下，它也应该能够处理嵌套对象，例如allergies.pet.cat 和 allergies.pet.dog.

我解决这个问题的最佳想法是制作一个函数，该函数递归调用 jsonlite::toJSON 并使用 stringr::str_extract("^[^.]*") 提取类别，但我无法完成这项工作。

Answer 1

这是一个似乎有效的函数。唯一的问题是是否存在可能的碰撞，例如 allergies.pet 和 allergies.pet.car；虽然它没有错误，但它可能是非标准的。

新数据：

df <- read.csv(textConnection(
"id,name,allergies.pollen,allergies.pet,attributes.height,attributes.gender,allergies.pet.cat
x,alice,no,yes,175,female,quux
y,bob,yes,yes,180,male,unk"))

函数：

func <- function(x) {
  grps <- split(names(x), gsub("[.].*", "", names(x)))
  for (nm in names(grps)) {
    if (length(grps[[nm]]) > 1 || !nm %in% grps[[nm]]) {
      x[[nm]] <- setNames(subset(x, select = grps[[nm]]),
                          gsub("^[^.]+[.]", "", grps[[nm]]))
      x[,setdiff(grps[[nm]], nm)] <- NULL
    }
  }
  for (nm in names(x)) {
    if (is.data.frame(x[[nm]])) {
      x[[nm]] <- func(x[[nm]])
    }
  }
  if (any(grepl("[.]", names(x)))) func(x) else x
}

看看这如何将所有 . 分隔的列嵌套到框架中：

str(df)
# 'data.frame': 2 obs. of  7 variables:
#  $ id               : chr  "x" "y"
#  $ name             : chr  "alice" "bob"
#  $ allergies.pollen : chr  "no" "yes"
#  $ allergies.pet    : chr  "yes" "yes"
#  $ attributes.height: int  175 180
#  $ attributes.gender: chr  "female" "male"
#  $ allergies.pet.cat: chr  "quux" "unk"
newdf <- func(df)
str(newdf)
# 'data.frame': 2 obs. of  4 variables:
#  $ id        : chr  "x" "y"
#  $ name      : chr  "alice" "bob"
#  $ allergies :'data.frame':   2 obs. of  2 variables:
#   ..$ pollen: chr  "no" "yes"
#   ..$ pet   :'data.frame':    2 obs. of  2 variables:
#   .. ..$ pet: chr  "yes" "yes"
#   .. ..$ cat: chr  "quux" "unk"
#  $ attributes:'data.frame':   2 obs. of  2 variables:
#   ..$ height: int  175 180
#   ..$ gender: chr  "female" "male"

从这里开始，直接进行 jsonify：

jsonlite::toJSON(newdf, pretty = TRUE)
# [
#   {
#     "id": "x",
#     "name": "alice",
#     "allergies": {
#       "pollen": "no",
#       "pet": {
#         "pet": "yes",
#         "cat": "quux"
#       }
#     },
#     "attributes": {
#       "height": 175,
#       "gender": "female"
#     }
#   },
#   {
#     "id": "y",
#     "name": "bob",
#     "allergies": {
#       "pollen": "yes",
#       "pet": {
#         "pet": "yes",
#         "cat": "unk"
#       }
#     },
#     "attributes": {
#       "height": 180,
#       "gender": "male"
#     }
#   }
# ]

R 将数据框转换为按列名分组的嵌套 json file/object

R convert dataframe to a nested json file/object grouped by column names

r

stringr

dplyr

jsonlite