问题规范化 json 和使用 Pandas

Issue normalising json and using Pandas

我在尝试规范化我的 json 响应时遇到问题

Json 示例:

{
   "data":{
      "flavors":[
                 {
                  "name":"Basic_A4",
                  "cpu_number":8,
                  "ram":14336,
                  "last_price":"0.200",
                  "currency":"USD"
                  },
                  {...}
            ],
      "aws":[
                  {
                  "name":"md5.xlarge",
                  "cpu_number":2,
                  "ram":14336,
                  "last_price":"0.100",
                  "currency":"USD"
                  },
                  {...}
            ]
   }
}

我的代码:

df_values = pd.json_normalize(content['data']['flavors']['aws'],meta=['name', 'cpu_number', 'ram', 'last_price', 'currency'])
        for index, row in df_values.iterrows():
                df1 = df1.append(row)

错误:

TypeError: list indices must be integers or slices, not str

当使用以下代码时,它可以工作,但我没有从 AWS 数组中获取数据


    df_values = pd.json_normalize(content['data']['flavors'],meta=['name', 'cpu_number', 'ram', 'last_price', 'currency'])
            for index, row in df_values.iterrows():
                    df1 = df1.append(row)

我想要的输出:

name cpu_number ram last_price currency
Basic_A4 8 14336 0.200 USD
example2 4 14336 0.100 USD
md5.xlarge 2 14336 0.100 USD
example4 2 7324 0.055 USD

更新:我现在已经使用了建议的方法,但是我的新 json

有问题

{
   "data":{
      "flavors":[
                 {
                  "name":"Basic_A4",
                  "cpu_number":8,
                  "ram":14336,
                  "last_price":"0.200",
                  "currency":"USD"
                  "provider": {
                    "name": "Azure"
                  }
                 }
            ],
      "aws":[
                  {
                  "name":"md5.xlarge",
                  "cpu_number":2,
                  "ram":14336,
                  "last_price":"0.100",
                  "currency":"USD"
                  "provider": {
                    "name": "AWS"
                  }
                 }
            ]
   }
}

table 现在看起来像这样:

| name| cpu_number| ram| last_price| currency| provider|
|---- |------| -----|-----|-----|-----|
| Basic_A4| 8| 14336|0.200|USD| {'name': 'Azure'}|
| example2| 4| 14336|0.100|USD| {'name': 'Azure'}|
| md5.xlarge| 2| 14336|0.100|USD| {'name': 'AWS'}|
| example4| 2| 7324|0.055|USD| {'name': 'AWS'}|

我希望它看起来像这样:

name cpu_number ram last_price currency provider
Basic_A4 8 14336 0.200 USD Azure

谢谢

您的 json 文件,名为 data.json(在我的示例中):

{
    "data":{
       "flavors":[
                  {
                   "name":"Basic_A4",
                   "cpu_number":8,
                   "ram":14336,
                   "last_price":"0.200",
                   "currency":"USD"
                   }
             ],
       "aws":[
                   {
                   "name":"md5.xlarge",
                   "cpu_number":2,
                   "ram":14336,
                   "last_price":"0.100",
                   "currency":"USD"
                   }
             ]
    }
 }

读取 .json 文件并将其解析为具有 chainitertools 的数据帧。

import json
import re
from pandas.io.json import json_normalize
from itertools import chain

# read your json file
file = 'data.json'
with open(file) as train_file:
    dict_train = json.load(train_file)

# parse it into a dataframe
dftemp = list(chain.from_iterable(dict_train["data"].values()))

tempArr = []
for i in dftemp:
    tempArr.append(i["provider"]["name"])

dftemp = pd.DataFrame(dftemp)
dftemp["provider"] = tempArr

dftemp

         name  cpu_number    ram last_price currency provider
0    Basic_A4           8  14336      0.200      USD    Azure
1  md5.xlarge           2  14336      0.100      USD      AWS

除了上面的回答,你还可以这样做:

    from pandas.io.json import json_normalize
data = {
   "data":{
      "flavors":[
                 {
                  "name":"Basic_A4",
                  "cpu_number":8,
                  "ram":14336,
                  "last_price":"0.200",
                  "currency":"USD"
                  },
                  {...}
            ],
      "aws":[
                  {
                  "name":"md5.xlarge",
                  "cpu_number":2,
                  "ram":14336,
                  "last_price":"0.100",
                  "currency":"USD"
                  },
                  {...}
            ]
   }
}
result = json_normalize(data['data']["flavors"])
result2 = json_normalize(data['data']["aws"])
df_new = pd.concat([result,result2], axis=0)
df_new.reset_index(drop=True, inplace=True)
print(df_new)