Scraping 来自 reddit api (praw) in python 的 subreddits、评论和回复流

Question

我想在 python 中打印来自 Reddit 的特朗普的实时提要。输出涉及其中包含“trump”的任何话题、评论或回复。我正在尝试这段代码，但它似乎没有提供完整的输出。

import praw

reddit = praw.Reddit(client_id='.....',
                     client_secret='.....', password='....',
                     user_agent='testscript by /u/......', username='.....')

subreddit = reddit.subreddit('worldnews')

findme = "Trump"

for comment in subreddit.stream.comments():
    try:
        parent_id = str(comment.parent())
        submission = reddit.comment(parent_id)

        if submission.body.find(findme) != -1:
            print(submission.body)
            print('\n')
            if comment.body.find(findme) != -1:
                print(comment.body)
                for reply in submission.replies:
                    print(reply)
        else:
            continue
    except praw.exceptions.PRAWException as e:
        pass

Answer 1

当您使用流时，您可能不会获得包含给定单词的提交的所有评论。评论出现 as they become available，那一刻，他们可能没有任何回复。此外，较旧的评论 - 在您的脚本开始之前写的 - 使用给定的关键字将不会被流捕获。

此外，您的代码中唯一的问题是您没有检查回复是否真的在他们的身体上有“特朗普”：

for reply in submission.replies:
    if reply.body.find(findme) != -1:
        print(reply)

Scraping 来自 reddit api (praw) in python 的 subreddits、评论和回复流

Scraping stream of subreddits, comments, and replies from reddit api (praw) in python

python

reddit

web-scraping

praw