Python if 在 for 循环后超时

Python times out after if in for loop

F.W. This isn't just a PRAW question, it leans toward Python more than PRAW. Python people are welcome to contribute, and please note this is not my mother language xD!

基本上,我正在使用执行以下操作的 PRAW 编写 Reddit 机器人:


用例:

- Post by @dudeOne
 - Comment by @dudeTwo
  - Comment with "!completed" by @dudeOne
- Post by @dudeOne
 - Comment by @dudeTwo
  - Comment with "!completed" by @moderatorOne

打印("Hey"),以及:

- Post by @dudeOne
 - Comment by @dudeOne
  - Comment with "!completed" by @dudeOne

... 什么都不做,甚至可能会删除 + 消息@dudeOne。

这是我的乱码 (xD):

import praw
import os
import re

sub = "RedditsQuests"

client_id = os.environ.get('client_id')
client_secret = os.environ.get('client_secret')
password = os.environ.get('pass')

reddit = praw.Reddit(client_id=client_id,
                     client_secret=client_secret,
                     password=password,
                     user_agent='r/RedditsQuests bot',
                     username='TheQuestMaster')

for submission in reddit.subreddit(sub).new(limit=None):
    submission.comments.replace_more(limit=None)
    if submission.saved is False:
        for comment in submission.comments.list():
            if ((("!completed" in comment.body)) and ((comment.is_submitter) or ('RedditsQuests' in comment.author.moderated())) and (comment.parent().author.name is not submission.author.name)):
              print("etc...")

有一个大小适中的堆栈,所以我将其添加到 this bin 中供您参考。在我看来,PRAW 似乎正在超时,因为 if-in-for 循环花费的时间太长。不过我可能是错的!

这个问题(如您所说)有点零星,但我已经缩小了范围。事实证明,尝试获取由 /u/AutoModerator 主持的 subreddits 有时会超时(大概是因为列表很长)。

找出问题

以下是我发现问题的方式。如果您只对解决方案感兴趣,请跳过此部分。

首先,我修改了您的脚本以使用 tryexcept 来捕获发生的异常。你的回溯告诉我,它发生在以 if ((("!completed" in comment.body)) 开头的行上,特别是在获取用户审核的 subreddits 时。这是我修改后的脚本:

for submission in reddit.subreddit(sub).new(limit=None):
    submission.comments.replace_more(limit=None)
    if submission.saved is False:
        for comment in submission.comments.list():
            try:
                if (
                    (("!completed" in comment.body))
                    and (
                        (comment.is_submitter)
                        or ("RedditsQuests" in comment.author.moderated())
                    )
                    and (comment.parent().author.name is not submission.author.name)
                ):
                    print("etc...")
            except Exception:
                print(f'Author: {comment.author} ({type(comment.author)})')

并且输出:

etc...
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
etc...
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
etc...
etc...
etc...
etc...
etc...
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
etc...

考虑到这一点,我编写了一个非常简单的 3 行脚本来重现该问题:

import praw

reddit = praw.Reddit(...)

print(reddit.redditor("AutoModerator").moderated())

有时此脚本会成功,但有时会因相同的套接字读取超时而失败。据推测,超时的发生是因为 AutoModerator 审核了太多的 subreddits(至少 10,000),并且 Reddit API 处理请求的时间太长。

解决问题

您的脚本试图确定有问题的 redditor 是否是 subreddit 的版主。您通过检查 subreddit 是否在用户的审核 subreddits 列表中来执行此操作,但您可以将其切换为检查用户是否在 subreddit 的审核员列表中。这不仅不会超时,而且您将节省大量网络请求,因为您只需获取一次版主列表。

The PRAW documentation of Subreddit 展示了我们如何获得 subreddit 版主列表。对于您的情况,我们可以做到

moderators = list(reddit.subreddit(sub).moderator())

然后,我们不检查 "RedditsQuests" in comment.author.moderated(),而是检查

comment.author in moderators

您的代码将变为

import praw
import os
import re

sub = "RedditsQuests"

client_id = os.environ.get("client_id")
client_secret = os.environ.get("client_secret")
password = os.environ.get("pass")

reddit = praw.Reddit(
    client_id=client_id,
    client_secret=client_secret,
    password=password,
    user_agent="r/RedditsQuests bot",
    username="TheQuestMaster",
)

moderators = list(reddit.subreddit(sub).moderator())
for submission in reddit.subreddit(sub).new(limit=None):
    submission.comments.replace_more(limit=None)
    if submission.saved is False:
        for comment in submission.comments.list():
            if (
                (("!completed" in comment.body))
                and ((comment.is_submitter) or (comment.author in moderators))
                and (comment.parent().author.name is not submission.author.name)
            ):
                print("etc...")

在我的简短测试中,这个脚本的运行速度快了很多倍,因为我们只获得一次版主列表,而不是获取所有评论用户主持的所有子版块。


作为不相关的样式说明,您应该使用 if not submission.saved 而不是 if submission.saved is False,这是检查条件是否为假的常规方法。