展平 JSON 响应并输出到 csv

Question

我似乎已经用尽了互联网来寻找常见的事情，我需要一些帮助。

我正在使用请求库进行 API 调用，每个调用 returns 一个 JSON 响应 - 我将循环并进行多次调用。

我想将许多 API 调用的所有响应合并到一个 python 数据结构中，然后将结果导出到 CSV。

一个 API 响应如下所示：

{
    "status": "1",
    "msg": "Success",
    "data": {
      "id": "12345",
      "PriceDetail": [
        {
          "item": "Apple",
          "amount": "10",
          "weight": "225",
          "price": "92",
          "bestbeforeendeate": "30/09/2023"
        }
        ]
    }
}

我的最终输出应该是一个 CSV 文件，其中包含以下 headers 和后续行中的数据：

id	item	amount	weight	price	bestbeforeendeate
12345	apple	10	225	92	30/09/2023
.....	.....	..	...	..	..........

我研究过将响应组合到字典中，命名为元组、数据框，并尝试了各种选项以从所述结构导出到 dictwriter、csvwriter、normalize 等。不过，我仍在努力制作任何一个它有效。

我得到的最接近的是（我将结果保存到 JSON 文件以停止点击 API）：

with open('item.json') as json_file: 
    data_set = json.load(json_file) 
    for data in data_set: 
        if data['msg'] == 'Success': 
            id = data['data']['id'] 
            return_data[id] = data['data']['PriceDetail'] 

df = pd.json_normalize(data['data']['PriceDetail']) 
print(df)

我无法将 id 添加到数据框

如有任何建议，我们将不胜感激。

谢谢，

Answer 1

Pandas有个函数叫json_normalize，可以直接把dict转成dataframe。为了将 JSON 字符串转换为 dict，您可以简单地使用 json 库。我发现好的来源是 this`.

import json
import pandas as pd

# Test string, assuming it is from API
test_string = """{
    "status": "1",
    "msg": "Success",
    "data": {
      "id": "12345",
      "PriceDetail": [
        {
          "item": "Apple",
          "amount": "10",
          "weight": "225",
          "price": "92",
          "bestbeforeendeate": "30/09/2023"
        }
        ]
    }
}"""

# Function converts the api result to the dataframe and appends it to df
def add_new_entry_to_dataframe(df, api_result_string):
    input_parsed = json.loads(api_result_string)
    df_with_new_data = pd.json_normalize(input_parsed['data']['PriceDetail'])
    df = df.append(df_with_new_data)
    return df
    

# The dataframe you want to store everything
df = pd.DataFrame()

## Loop where you fetch new data
for i in range(10):
    newly_fetched_result = test_string
    df = add_new_entry_to_dataframe(df, newly_fetched_result)


df = df.reset_index()

# Save as .csv
df.to_csv('output.csv')

print(df)

以上代码的输出：

item amount weight price bestbeforeendeate
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023
0  Apple     10    225    92        30/09/2023

编辑： 我重新审视了这个问题并认为我分享了另一个解决方案，这可能对你更好。下面的代码不是随着时间的推移构建一个巨大的数据框，而是将获取的数据直接附加到 CSV 文件中。优点是如果程序崩溃或终止它，所有数据都已经在 CSV 中。

# Function converts the json string to a dataframe and appends it directly to the CSV file
def add_json_string_to_csv(api_result_string):
    input_parsed = json.loads(api_result_string)
    df_with_new_data = pd.json_normalize(input_parsed['data']['PriceDetail'])
    df_with_new_data.to_csv('output.csv', mode='a', header=False)

## Loop where you fetch new data
while (True):
    newly_fetched_result = test_string
    add_json_string_to_csv(newly_fetched_result)

展平 JSON 响应并输出到 csv

Flatten JSON response and output to csv

python

json

export-to-csv

pandas

python-requests