如何将包含嵌套字典的字典转换为 Python 中的数据框？

Question

我最近在 Python 中使用 Oracle 的 AI 语言 API 进行了情绪分析。我让 API 迭代了 1300 条推文，并将 API 的输出存储在列表中，列表中的每个元素都对应一个推文 ID。然后我创建了一个字典，其中的键是推文 ID，值是该推文 ID 的 API 的输出。我现在有一个巨大的字典，其中的字典嵌套在字典中，但我不确定如何将其转换为 Pandas.

中的数据框

这是我正在使用的词典的前几个条目。

 {1292750633104289792: {
   "aspects": []
 },
 1275918779831238656: {
   "aspects": []
 },
 1293251961031204865: {
   "aspects": [
     {
       "length": 8,
       "offset": 51,
       "scores": {
         "Negative": 0.18023298680782318,
         "Neutral": 0.0,
         "Positive": 0.8197670578956604
       },
       "sentiment": "Positive",
       "text": "building"
     }
   ]
 },
 1293312774563606531: {
   "aspects": []
 },
 1293375754751881217: {
   "aspects": [
     {
       "length": 4,
       "offset": 5,
       "scores": {
         "Negative": 0.9987309575080872,
         "Neutral": 0.0012690634466707706,
         "Positive": 0.0
       },
       "sentiment": "Negative",
       "text": "poll"
     }
   ]
 }}

提前致谢。

Answer 1

您可以使用嵌套理解来展平您的结构，然后将结果传递给 pd.DataFrame:

import pandas as pd
r = [{'tweet_id':a, 
       'length':i['length'],
        'offset':i['offset'],
        **{f'score_{j}':k for j, k in i['scores'].items()},
        'sentiment':i['sentiment'],
        'text':i['text'],
     } 
     for a, b in data.items() for i in (b['aspects'] if isinstance(b, dict) else b.aspects)]

df = pd.DataFrame(r)

输出：

              tweet_id  length  offset  score_Negative  score_Neutral  score_Positive sentiment      text
0  1293251961031204865       8      51        0.180233       0.000000        0.819767  Positive  building
1  1293375754751881217       4       5        0.998731       0.001269        0.000000  Negative      poll

如何将包含嵌套字典的字典转换为 Python 中的数据框？

How do I convert a dictionary that has nested dictionaries within it into a dataframe in Python?

python

dictionary

nested

dataframe

pandas