Python if 在 for 循环后超时

Question

F.W. This isn't just a PRAW question, it leans toward Python more than PRAW. Python people are welcome to contribute, and please note this is not my mother language xD!

基本上，我正在使用执行以下操作的 PRAW 编写 Reddit 机器人：

遍历 "unsaved" 个帖子
遍历所述帖子的评论（针对子评论）
如果评论包含“!completed”，则为提交者所写或为版主，且父评论不是提交者所写：
做等，e.x。打印（"Hey"）不，我没有解释得太清楚。例子更好，所以这里xD:

用例：

- Post by @dudeOne
 - Comment by @dudeTwo
  - Comment with "!completed" by @dudeOne

- Post by @dudeOne
 - Comment by @dudeTwo
  - Comment with "!completed" by @moderatorOne

打印("Hey")，以及：

- Post by @dudeOne
 - Comment by @dudeOne
  - Comment with "!completed" by @dudeOne

... 什么都不做，甚至可能会删除 + 消息@dudeOne。

这是我的乱码 (xD)：

import praw
import os
import re

sub = "RedditsQuests"

client_id = os.environ.get('client_id')
client_secret = os.environ.get('client_secret')
password = os.environ.get('pass')

reddit = praw.Reddit(client_id=client_id,
                     client_secret=client_secret,
                     password=password,
                     user_agent='r/RedditsQuests bot',
                     username='TheQuestMaster')

for submission in reddit.subreddit(sub).new(limit=None):
    submission.comments.replace_more(limit=None)
    if submission.saved is False:
        for comment in submission.comments.list():
            if ((("!completed" in comment.body)) and ((comment.is_submitter) or ('RedditsQuests' in comment.author.moderated())) and (comment.parent().author.name is not submission.author.name)):
              print("etc...")

有一个大小适中的堆栈，所以我将其添加到 this bin 中供您参考。在我看来，PRAW 似乎正在超时，因为 if-in-for 循环花费的时间太长。不过我可能是错的！

Answer 1

这个问题（如您所说）有点零星，但我已经缩小了范围。事实证明，尝试获取由 /u/AutoModerator 主持的 subreddits 有时会超时（大概是因为列表很长）。

找出问题

以下是我发现问题的方式。如果您只对解决方案感兴趣，请跳过此部分。

首先，我修改了您的脚本以使用 try 和 except 来捕获发生的异常。你的回溯告诉我，它发生在以 if ((("!completed" in comment.body)) 开头的行上，特别是在获取用户审核的 subreddits 时。这是我修改后的脚本：

for submission in reddit.subreddit(sub).new(limit=None):
    submission.comments.replace_more(limit=None)
    if submission.saved is False:
        for comment in submission.comments.list():
            try:
                if (
                    (("!completed" in comment.body))
                    and (
                        (comment.is_submitter)
                        or ("RedditsQuests" in comment.author.moderated())
                    )
                    and (comment.parent().author.name is not submission.author.name)
                ):
                    print("etc...")
            except Exception:
                print(f'Author: {comment.author} ({type(comment.author)})')

并且输出：

etc...
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
etc...
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
etc...
etc...
etc...
etc...
etc...
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
etc...

考虑到这一点，我编写了一个非常简单的 3 行脚本来重现该问题：

import praw

reddit = praw.Reddit(...)

print(reddit.redditor("AutoModerator").moderated())

有时此脚本会成功，但有时会因相同的套接字读取超时而失败。据推测，超时的发生是因为 AutoModerator 审核了太多的 subreddits（至少 10,000），并且 Reddit API 处理请求的时间太长。

解决问题

您的脚本试图确定有问题的 redditor 是否是 subreddit 的版主。您通过检查 subreddit 是否在用户的审核 subreddits 列表中来执行此操作，但您可以将其切换为检查用户是否在 subreddit 的审核员列表中。这不仅不会超时，而且您将节省大量网络请求，因为您只需获取一次版主列表。

The PRAW documentation of Subreddit 展示了我们如何获得 subreddit 版主列表。对于您的情况，我们可以做到

moderators = list(reddit.subreddit(sub).moderator())

然后，我们不检查 "RedditsQuests" in comment.author.moderated()，而是检查

comment.author in moderators

您的代码将变为

import praw
import os
import re

sub = "RedditsQuests"

client_id = os.environ.get("client_id")
client_secret = os.environ.get("client_secret")
password = os.environ.get("pass")

reddit = praw.Reddit(
    client_id=client_id,
    client_secret=client_secret,
    password=password,
    user_agent="r/RedditsQuests bot",
    username="TheQuestMaster",
)

moderators = list(reddit.subreddit(sub).moderator())
for submission in reddit.subreddit(sub).new(limit=None):
    submission.comments.replace_more(limit=None)
    if submission.saved is False:
        for comment in submission.comments.list():
            if (
                (("!completed" in comment.body))
                and ((comment.is_submitter) or (comment.author in moderators))
                and (comment.parent().author.name is not submission.author.name)
            ):
                print("etc...")

在我的简短测试中，这个脚本的运行速度快了很多倍，因为我们只获得一次版主列表，而不是获取所有评论用户主持的所有子版块。

作为不相关的样式说明，您应该使用 if not submission.saved 而不是 if submission.saved is False，这是检查条件是否为假的常规方法。

Python if 在 for 循环后超时

Python times out after if in for loop

python

python-3.x

praw

用例：

找出问题

解决问题