问题规范化 json 和使用 Pandas

Question

我在尝试规范化我的 json 响应时遇到问题

Json 示例：

{
   "data":{
      "flavors":[
                 {
                  "name":"Basic_A4",
                  "cpu_number":8,
                  "ram":14336,
                  "last_price":"0.200",
                  "currency":"USD"
                  },
                  {...}
            ],
      "aws":[
                  {
                  "name":"md5.xlarge",
                  "cpu_number":2,
                  "ram":14336,
                  "last_price":"0.100",
                  "currency":"USD"
                  },
                  {...}
            ]
   }
}

我的代码：

df_values = pd.json_normalize(content['data']['flavors']['aws'],meta=['name', 'cpu_number', 'ram', 'last_price', 'currency'])
        for index, row in df_values.iterrows():
                df1 = df1.append(row)

错误：

TypeError: list indices must be integers or slices, not str

当使用以下代码时，它可以工作，但我没有从 AWS 数组中获取数据


    df_values = pd.json_normalize(content['data']['flavors'],meta=['name', 'cpu_number', 'ram', 'last_price', 'currency'])
            for index, row in df_values.iterrows():
                    df1 = df1.append(row)

我想要的输出：

name	cpu_number	ram	last_price	currency
Basic_A4	8	14336	0.200	USD
example2	4	14336	0.100	USD
md5.xlarge	2	14336	0.100	USD
example4	2	7324	0.055	USD

更新：我现在已经使用了建议的方法，但是我的新 json

有问题


{
   "data":{
      "flavors":[
                 {
                  "name":"Basic_A4",
                  "cpu_number":8,
                  "ram":14336,
                  "last_price":"0.200",
                  "currency":"USD"
                  "provider": {
                    "name": "Azure"
                  }
                 }
            ],
      "aws":[
                  {
                  "name":"md5.xlarge",
                  "cpu_number":2,
                  "ram":14336,
                  "last_price":"0.100",
                  "currency":"USD"
                  "provider": {
                    "name": "AWS"
                  }
                 }
            ]
   }
}

table 现在看起来像这样：

| name| cpu_number| ram| last_price| currency| provider|
|---- |------| -----|-----|-----|-----|
| Basic_A4| 8| 14336|0.200|USD| {'name': 'Azure'}|
| example2| 4| 14336|0.100|USD| {'name': 'Azure'}|
| md5.xlarge| 2| 14336|0.100|USD| {'name': 'AWS'}|
| example4| 2| 7324|0.055|USD| {'name': 'AWS'}|

我希望它看起来像这样：

name	cpu_number	ram	last_price	currency	provider
Basic_A4	8	14336	0.200	USD	Azure

谢谢

Answer 1

您的 json 文件，名为 data.json（在我的示例中）：

{
    "data":{
       "flavors":[
                  {
                   "name":"Basic_A4",
                   "cpu_number":8,
                   "ram":14336,
                   "last_price":"0.200",
                   "currency":"USD"
                   }
             ],
       "aws":[
                   {
                   "name":"md5.xlarge",
                   "cpu_number":2,
                   "ram":14336,
                   "last_price":"0.100",
                   "currency":"USD"
                   }
             ]
    }
 }

读取 .json 文件并将其解析为具有 chain 和 itertools 的数据帧。

import json
import re
from pandas.io.json import json_normalize
from itertools import chain

# read your json file
file = 'data.json'
with open(file) as train_file:
    dict_train = json.load(train_file)

# parse it into a dataframe
dftemp = list(chain.from_iterable(dict_train["data"].values()))

tempArr = []
for i in dftemp:
    tempArr.append(i["provider"]["name"])

dftemp = pd.DataFrame(dftemp)
dftemp["provider"] = tempArr

dftemp

         name  cpu_number    ram last_price currency provider
0    Basic_A4           8  14336      0.200      USD    Azure
1  md5.xlarge           2  14336      0.100      USD      AWS

Answer 2

除了上面的回答，你还可以这样做：

    from pandas.io.json import json_normalize
data = {
   "data":{
      "flavors":[
                 {
                  "name":"Basic_A4",
                  "cpu_number":8,
                  "ram":14336,
                  "last_price":"0.200",
                  "currency":"USD"
                  },
                  {...}
            ],
      "aws":[
                  {
                  "name":"md5.xlarge",
                  "cpu_number":2,
                  "ram":14336,
                  "last_price":"0.100",
                  "currency":"USD"
                  },
                  {...}
            ]
   }
}
result = json_normalize(data['data']["flavors"])
result2 = json_normalize(data['data']["aws"])
df_new = pd.concat([result,result2], axis=0)
df_new.reset_index(drop=True, inplace=True)
print(df_new)

问题规范化 json 和使用 Pandas

Issue normalising json and using Pandas

python

json

database-normalization

pandas