解析嵌套 JSON 并迭代到 Pandas Dataframe

Question

我正在使用 Foursquare API 调用来查找与美国特定邮政编码关联的场所。

我能够生成包含信息的 JSON，但在循环和解析以构建 pandas 数据帧时遇到问题。

到目前为止：

# scraping the foursquare website for the information we want and obtaining the json file as results

for i, series in df_income_zip_good.iterrows():
    lat = series ['lat']
    lng = series ['lng']
    town = series ['place']
    LIMIT = 100
    radius = 1000
    url4Sqr = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
        CLIENT_ID,
        CLIENT_SECRET,
        VERSION,
        lat,
        lng,
        radius,
        LIMIT)

   venues = requests.get(url4Sqr).json()
   #print results from call
   print (venues)
#

这工作正常并产生 JSON。我已将输出链接到 GitHub 上的 JSON 文件：(https://github.com/adhorvitz/coursera_ibm_capstone/blob/524c6609ea8872e0c188cd373a4778caaadb1cf6/venuedatasample.json)

我不确定如何最好地展平 JSON，然后循环提取我想要加载到数据框中的信息片段。我试图弄乱以下内容但没有成功。

def flatten_json(nested_json, exclude=['']):
    """Flatten json object with nested keys into a single level.
        Args:
            nested_json: A nested json object.
            exclude: Keys to exclude from output.
        Returns:
            The flattened json object if successful, None otherwise.
            The code recursively extracts values out of the object into a flattened dictionary. json_normalize can be applied to the output of flatten_object to produce a python dataframe:
    """
    out = {}

    def flatten(x, name='venues', exclude=exclude):
        if type(x) is dict:
            for a in x:
                if a not in exclude: flatten(x[a], name + a + '_')
        elif type(x) is list:
            i = 0
            for a in x:
                flatten(a, name + str(i) + '_')
                i += 1
        else:
            out[name[:-1]] = x

    flatten(nested_json)
    return out
#https://towardsdatascience.com/flattening-json-objects-in-python-f5343c794b10

然后我运行:

for i in venues():
    json_flat_venues = flatten_json(venues)
    json_flat_venues

产生一个错误，指出 'dict' 对象不可调用。

我也试过：

for i in venues():
    df_venues_good = pd.json_normalize(venues)
    df_venues_good

产生同样的错误。

我对去哪里以及如何最好地将 JSON 转换为可用的 DF 有点迷茫。

提前致谢。

--------更新------------

所以我尝试了一些东西。

我引用了评论留下的页面后： https://www.geeksforgeeks.org/flattening-json-objects-in-python/, 我安装了 json_flatten（使用 pop），但导入时遇到问题展平。
作为变通尝试，我尝试从网站重新创建代码，以适应我的项目。我想我弄得乱七八糟的比我清理的还多。
我重新运行原来的“flatten_json”def（见上）。然后我在没有 for 循环语句（也在上面）的情况下分配了 df_venues_good。
删除 for 循环后，看起来它开始从 json 中提取第一条记录。但是，它看起来像是元数据（或者至少是我不想提取的数据）。
我在查看 json 时也注意到一个问题。在我的输出（我使用的是 Jupyter 笔记本）单元格中，看起来所有记录都已检索（总共大约有 95 条）。

然后我运行这只是转储文件以检查：

JsonString = json.dumps(venues)
JsonFile = open("venuedata.json", "w")
JsonFile.write(JsonString)
JsonFile.close()

当我打开转储文件（我把它放在上面的链接）时，它看起来并不完整。

任何方向将不胜感激。

Answer 1

经过 4 天的交流，我想我明白了你真正的问题，这将推动你前进。您需要查找并解决以下两个错误。如果您同意自己进行更多工作故障排除，请将我的答案标记为正确，围绕您的见解、错误和关于以下图片的问题创建一个新问题。

可以用来帮助“扁平化 Json 数据”的库有 Pandas、requests、Jsons，甚至 csv 库也可以在这里帮助您。

事实上，您正在学习 python、数据分析以及如何使用 api，如果没有更清晰的描述和示例，您将在 Whosebug 上找不到更多帮助您的技术问题。

请继续自学，继续努力！你明白了:)

请让我们知道社区如何在您成长过程中帮助解决个人问题:)

解析嵌套 JSON 并迭代到 Pandas Dataframe

Parse nested JSON and iterate into Pandas Dataframe

python

json

nested

flatten

pandas