将 reddit json 解析为 Python 数组并打印数组中的项目

Question

这是我编码的头几周；为一个基本问题道歉。

我已经设法解析了 'WorldNews' subreddit json，识别了个人 children（我写的时候有 24 个）并获取了每个新闻条目的标题。我现在正在尝试根据这些新闻标题创建一个数组。下面的代码每 2-3 次尝试将第五个标题 ([4]) 打印到命令行（否则会提供以下错误）。它也不会一次打印多个标题（例如，如果我尝试 [2,3,4]，我将不断得到相同的错误）。

无法编译时出现的错误：

in <module> Children = theJSON["data"]["children"] KeyError: 'data'

我的脚本：

import requests 
import json


r = requests.get('https://www.reddit.com/r/worldnews/.json')
theJSON = json.loads(r.text)
Children = theJSON["data"]["children"]

News_list = []

for post in Children:
    News_list.append (post["data"]["title"])

print News_list [4]

Answer 1

这意味着无论出于何种原因，您获得的有效负载中都没有 data 密钥。我不知道 Reddit 的 JSON API；我测试了请求，发现您使用的是正确的密钥。事实上，您说您的代码每隔几次就有效，这告诉我您在请求之间得到了不同的响应。我无法重现它，我尝试一遍又一遍地发出请求并检查正确的响应。如果我不得不猜测为什么你会得到不同的东西，我会说它必须是速率限制或临时 503（Reddit 有问题。）

您可以通过捕获 KeyError 或使用字典的 .get 方法来防止这种情况。

捕捉KeyError：

try:
    Children = theJSON["data"]["children"]
except KeyError:
    print 'bad payload'
    return

使用.get:

Children = theJSON.get("data", {}).get("children")
if not Children:
    print 'bad payload'
    return

Answer 2

我在 Eric. The issue here was in fact not related to the key, parsing or presentation of the dict or array. When requesting a Url from reddit and attempting to print the json string output we encounter an HTTP Error 429. Fixing this is simple. The answer was found on this redditdev 线程的帮助下找到了解决方案。

解决方案：通过为请求 Url（header 中的 'User-agent'）的设备添加一个标识符，它可以顺利运行并且每次都能正常工作。

import requests
import json

r = requests.get('https://www.reddit.com/r/worldnews.json', headers = {'User-agent': 'Chrome'})

theJSON = json.loads(r.text)
print theJSON

将 reddit json 解析为 Python 数组并打印数组中的项目

Parsing reddit json into Python array and print items from array

python

arrays

json

list

reddit