将嵌套的 mongo 数据库文档转换为 pandas 数据框

Convert nested mongo db documents into pandas dataframe

我有一个 mongoDB collection,里面有这样的文档

doc = {
  "_id": {
    "$oid": "516622c9ce21150200000d87"
  },
  "SubmissionDate": {
    "$date": "2013-04-11T02:41:13.162Z"
  },
  "isComplete": True,

  "Rounds": [
    {
      "Photo": [
        
      ],
      "A": {
        "Complexity": 55,
        "Colour": 85,
        "Deep": 51,
        "Effervescence": 44
      },
      "B": {
        "QualityPIDs": [
          
        ],
        "QualityScales": [
          
        ],
        "Complexity": 43,
        "Qualities": [
          
        ]
      },
      "C": {
        "QualityPIDs": [
          
        ],
        "QualityScales": [
          
        ],
        "Complexity": 60,
        "UHS": 46,
        "Colour": 33,
        "Qualities": [
          
        ]
      },
      "D": {
        "Complexity": 73,
        "Duration": 68,
        "Quality": 65
      }
    }
  ],
  "Item": {
    "_id": {
      "$oid": "51e6d678c06918db21156f92"
    },
    "Country": "Australia",
    "Name": "King",
    "PeopleId": {
      "$oid": "51dddb69a9d9350200000"
    },
    "Style": "Apple",
    "Type": "Flat",
    "UserSubmitted": False
  }
}

我需要将这个 collection 转换成 pandas 数据帧。

此处建议的解决方案How to import data from mongodb to pandas? 做主要工作。但我还有 Rounds 列,里面有字典。

为了访问 Rounds

的子词典,我做了一组循环
df = pd.json_normalize(doc)

A_data = pd.DataFrame(columns=df.Rounds[0][0]['A'].keys())
for i in range(len(df.Rounds)):
    A_data = A_data.append(pd.json_normalize(df.Rounds[0][0]['A']), ignore_index=True)

最后我将 A_data 连接到我的主数据框。

有没有更快的方法?现在循环需要很多时间。谢谢!

  • dict的每个级别都可以使用mata参数指定,record_path使用'Rounds'
import pandas as pd

meta = [['_id', '$oid'],
        ['Item', 'Country'],
        ['Item', 'Name'],
        ['Item', 'Style'],
        ['Item', 'Type'],
        ['Item', 'UserSubmitted'],
        ['Item', '_id', '$oid'],
        ['Item', 'PeopleId', '$oid'],
        ['SubmissionDate', '$date'],
        'isComplete']

df = pd.json_normalize(doc, record_path='Rounds', meta=meta)

# display(df)
  Photo  A.Complexity  A.Colour  A.Deep  A.Effervescence B.QualityPIDs B.QualityScales  B.Complexity B.Qualities C.QualityPIDs C.QualityScales  C.Complexity  C.UHS  C.Colour C.Qualities  D.Complexity  D.Duration  D.Quality                  _id.$oid Item.Country Item.Name Item.Style Item.Type Item.UserSubmitted             Item._id.$oid     Item.PeopleId.$oid      SubmissionDate.$date isComplete
0    []            55        85      51               44            []              []            43          []            []              []            60     46        33          []            73          68         65  516622c9ce21150200000d87    Australia      King      Apple      Flat              False  51e6d678c06918db21156f92  51dddb69a9d9350200000  2013-04-11T02:41:13.162Z       True