问题规范化 json 和使用 Pandas
Issue normalising json and using Pandas
我在尝试规范化我的 json 响应时遇到问题
Json 示例:
{
"data":{
"flavors":[
{
"name":"Basic_A4",
"cpu_number":8,
"ram":14336,
"last_price":"0.200",
"currency":"USD"
},
{...}
],
"aws":[
{
"name":"md5.xlarge",
"cpu_number":2,
"ram":14336,
"last_price":"0.100",
"currency":"USD"
},
{...}
]
}
}
我的代码:
df_values = pd.json_normalize(content['data']['flavors']['aws'],meta=['name', 'cpu_number', 'ram', 'last_price', 'currency'])
for index, row in df_values.iterrows():
df1 = df1.append(row)
错误:
TypeError: list indices must be integers or slices, not str
当使用以下代码时,它可以工作,但我没有从 AWS 数组中获取数据
df_values = pd.json_normalize(content['data']['flavors'],meta=['name', 'cpu_number', 'ram', 'last_price', 'currency'])
for index, row in df_values.iterrows():
df1 = df1.append(row)
我想要的输出:
name
cpu_number
ram
last_price
currency
Basic_A4
8
14336
0.200
USD
example2
4
14336
0.100
USD
md5.xlarge
2
14336
0.100
USD
example4
2
7324
0.055
USD
更新:我现在已经使用了建议的方法,但是我的新 json
有问题
{
"data":{
"flavors":[
{
"name":"Basic_A4",
"cpu_number":8,
"ram":14336,
"last_price":"0.200",
"currency":"USD"
"provider": {
"name": "Azure"
}
}
],
"aws":[
{
"name":"md5.xlarge",
"cpu_number":2,
"ram":14336,
"last_price":"0.100",
"currency":"USD"
"provider": {
"name": "AWS"
}
}
]
}
}
table 现在看起来像这样:
| name| cpu_number| ram| last_price| currency| provider|
|---- |------| -----|-----|-----|-----|
| Basic_A4| 8| 14336|0.200|USD| {'name': 'Azure'}|
| example2| 4| 14336|0.100|USD| {'name': 'Azure'}|
| md5.xlarge| 2| 14336|0.100|USD| {'name': 'AWS'}|
| example4| 2| 7324|0.055|USD| {'name': 'AWS'}|
我希望它看起来像这样:
name
cpu_number
ram
last_price
currency
provider
Basic_A4
8
14336
0.200
USD
Azure
谢谢
您的 json 文件,名为 data.json
(在我的示例中):
{
"data":{
"flavors":[
{
"name":"Basic_A4",
"cpu_number":8,
"ram":14336,
"last_price":"0.200",
"currency":"USD"
}
],
"aws":[
{
"name":"md5.xlarge",
"cpu_number":2,
"ram":14336,
"last_price":"0.100",
"currency":"USD"
}
]
}
}
读取 .json
文件并将其解析为具有 chain
和 itertools
的数据帧。
import json
import re
from pandas.io.json import json_normalize
from itertools import chain
# read your json file
file = 'data.json'
with open(file) as train_file:
dict_train = json.load(train_file)
# parse it into a dataframe
dftemp = list(chain.from_iterable(dict_train["data"].values()))
tempArr = []
for i in dftemp:
tempArr.append(i["provider"]["name"])
dftemp = pd.DataFrame(dftemp)
dftemp["provider"] = tempArr
dftemp
name cpu_number ram last_price currency provider
0 Basic_A4 8 14336 0.200 USD Azure
1 md5.xlarge 2 14336 0.100 USD AWS
除了上面的回答,你还可以这样做:
from pandas.io.json import json_normalize
data = {
"data":{
"flavors":[
{
"name":"Basic_A4",
"cpu_number":8,
"ram":14336,
"last_price":"0.200",
"currency":"USD"
},
{...}
],
"aws":[
{
"name":"md5.xlarge",
"cpu_number":2,
"ram":14336,
"last_price":"0.100",
"currency":"USD"
},
{...}
]
}
}
result = json_normalize(data['data']["flavors"])
result2 = json_normalize(data['data']["aws"])
df_new = pd.concat([result,result2], axis=0)
df_new.reset_index(drop=True, inplace=True)
print(df_new)
我在尝试规范化我的 json 响应时遇到问题
Json 示例:
{
"data":{
"flavors":[
{
"name":"Basic_A4",
"cpu_number":8,
"ram":14336,
"last_price":"0.200",
"currency":"USD"
},
{...}
],
"aws":[
{
"name":"md5.xlarge",
"cpu_number":2,
"ram":14336,
"last_price":"0.100",
"currency":"USD"
},
{...}
]
}
}
我的代码:
df_values = pd.json_normalize(content['data']['flavors']['aws'],meta=['name', 'cpu_number', 'ram', 'last_price', 'currency'])
for index, row in df_values.iterrows():
df1 = df1.append(row)
错误:
TypeError: list indices must be integers or slices, not str
当使用以下代码时,它可以工作,但我没有从 AWS 数组中获取数据
df_values = pd.json_normalize(content['data']['flavors'],meta=['name', 'cpu_number', 'ram', 'last_price', 'currency'])
for index, row in df_values.iterrows():
df1 = df1.append(row)
我想要的输出:
name | cpu_number | ram | last_price | currency |
---|---|---|---|---|
Basic_A4 | 8 | 14336 | 0.200 | USD |
example2 | 4 | 14336 | 0.100 | USD |
md5.xlarge | 2 | 14336 | 0.100 | USD |
example4 | 2 | 7324 | 0.055 | USD |
更新:我现在已经使用了建议的方法,但是我的新 json
有问题
{
"data":{
"flavors":[
{
"name":"Basic_A4",
"cpu_number":8,
"ram":14336,
"last_price":"0.200",
"currency":"USD"
"provider": {
"name": "Azure"
}
}
],
"aws":[
{
"name":"md5.xlarge",
"cpu_number":2,
"ram":14336,
"last_price":"0.100",
"currency":"USD"
"provider": {
"name": "AWS"
}
}
]
}
}
table 现在看起来像这样:
| name| cpu_number| ram| last_price| currency| provider|
|---- |------| -----|-----|-----|-----|
| Basic_A4| 8| 14336|0.200|USD| {'name': 'Azure'}|
| example2| 4| 14336|0.100|USD| {'name': 'Azure'}|
| md5.xlarge| 2| 14336|0.100|USD| {'name': 'AWS'}|
| example4| 2| 7324|0.055|USD| {'name': 'AWS'}|
我希望它看起来像这样:
name | cpu_number | ram | last_price | currency | provider |
---|---|---|---|---|---|
Basic_A4 | 8 | 14336 | 0.200 | USD | Azure |
谢谢
您的 json 文件,名为 data.json
(在我的示例中):
{
"data":{
"flavors":[
{
"name":"Basic_A4",
"cpu_number":8,
"ram":14336,
"last_price":"0.200",
"currency":"USD"
}
],
"aws":[
{
"name":"md5.xlarge",
"cpu_number":2,
"ram":14336,
"last_price":"0.100",
"currency":"USD"
}
]
}
}
读取 .json
文件并将其解析为具有 chain
和 itertools
的数据帧。
import json
import re
from pandas.io.json import json_normalize
from itertools import chain
# read your json file
file = 'data.json'
with open(file) as train_file:
dict_train = json.load(train_file)
# parse it into a dataframe
dftemp = list(chain.from_iterable(dict_train["data"].values()))
tempArr = []
for i in dftemp:
tempArr.append(i["provider"]["name"])
dftemp = pd.DataFrame(dftemp)
dftemp["provider"] = tempArr
dftemp
name cpu_number ram last_price currency provider
0 Basic_A4 8 14336 0.200 USD Azure
1 md5.xlarge 2 14336 0.100 USD AWS
除了上面的回答,你还可以这样做:
from pandas.io.json import json_normalize
data = {
"data":{
"flavors":[
{
"name":"Basic_A4",
"cpu_number":8,
"ram":14336,
"last_price":"0.200",
"currency":"USD"
},
{...}
],
"aws":[
{
"name":"md5.xlarge",
"cpu_number":2,
"ram":14336,
"last_price":"0.100",
"currency":"USD"
},
{...}
]
}
}
result = json_normalize(data['data']["flavors"])
result2 = json_normalize(data['data']["aws"])
df_new = pd.concat([result,result2], axis=0)
df_new.reset_index(drop=True, inplace=True)
print(df_new)