Python if 在 for 循环后超时
Python times out after if in for loop
F.W. This isn't just a PRAW question, it leans toward Python more than PRAW. Python people are welcome to contribute, and please note this is not my mother language xD!
基本上,我正在使用执行以下操作的 PRAW 编写 Reddit 机器人:
- 遍历 "unsaved" 个帖子
- 遍历所述帖子的评论(针对子评论)
- 如果评论包含“!completed”,则为提交者所写或为版主,且父评论不是提交者所写:
- 做等,e.x。打印("Hey")
不,我没有解释得太清楚。例子更好,所以这里xD:
用例:
- Post by @dudeOne
- Comment by @dudeTwo
- Comment with "!completed" by @dudeOne
- Post by @dudeOne
- Comment by @dudeTwo
- Comment with "!completed" by @moderatorOne
打印("Hey"),以及:
- Post by @dudeOne
- Comment by @dudeOne
- Comment with "!completed" by @dudeOne
... 什么都不做,甚至可能会删除 + 消息@dudeOne。
这是我的乱码 (xD):
import praw
import os
import re
sub = "RedditsQuests"
client_id = os.environ.get('client_id')
client_secret = os.environ.get('client_secret')
password = os.environ.get('pass')
reddit = praw.Reddit(client_id=client_id,
client_secret=client_secret,
password=password,
user_agent='r/RedditsQuests bot',
username='TheQuestMaster')
for submission in reddit.subreddit(sub).new(limit=None):
submission.comments.replace_more(limit=None)
if submission.saved is False:
for comment in submission.comments.list():
if ((("!completed" in comment.body)) and ((comment.is_submitter) or ('RedditsQuests' in comment.author.moderated())) and (comment.parent().author.name is not submission.author.name)):
print("etc...")
有一个大小适中的堆栈,所以我将其添加到 this bin 中供您参考。在我看来,PRAW 似乎正在超时,因为 if-in-for 循环花费的时间太长。不过我可能是错的!
这个问题(如您所说)有点零星,但我已经缩小了范围。事实证明,尝试获取由 /u/AutoModerator 主持的 subreddits 有时会超时(大概是因为列表很长)。
找出问题
以下是我发现问题的方式。如果您只对解决方案感兴趣,请跳过此部分。
首先,我修改了您的脚本以使用 try
和 except
来捕获发生的异常。你的回溯告诉我,它发生在以 if ((("!completed" in comment.body))
开头的行上,特别是在获取用户审核的 subreddits 时。这是我修改后的脚本:
for submission in reddit.subreddit(sub).new(limit=None):
submission.comments.replace_more(limit=None)
if submission.saved is False:
for comment in submission.comments.list():
try:
if (
(("!completed" in comment.body))
and (
(comment.is_submitter)
or ("RedditsQuests" in comment.author.moderated())
)
and (comment.parent().author.name is not submission.author.name)
):
print("etc...")
except Exception:
print(f'Author: {comment.author} ({type(comment.author)})')
并且输出:
etc...
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
etc...
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
etc...
etc...
etc...
etc...
etc...
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
etc...
考虑到这一点,我编写了一个非常简单的 3 行脚本来重现该问题:
import praw
reddit = praw.Reddit(...)
print(reddit.redditor("AutoModerator").moderated())
有时此脚本会成功,但有时会因相同的套接字读取超时而失败。据推测,超时的发生是因为 AutoModerator 审核了太多的 subreddits(至少 10,000),并且 Reddit API 处理请求的时间太长。
解决问题
您的脚本试图确定有问题的 redditor 是否是 subreddit 的版主。您通过检查 subreddit 是否在用户的审核 subreddits 列表中来执行此操作,但您可以将其切换为检查用户是否在 subreddit 的审核员列表中。这不仅不会超时,而且您将节省大量网络请求,因为您只需获取一次版主列表。
The PRAW documentation of Subreddit
展示了我们如何获得 subreddit 版主列表。对于您的情况,我们可以做到
moderators = list(reddit.subreddit(sub).moderator())
然后,我们不检查 "RedditsQuests" in comment.author.moderated()
,而是检查
comment.author in moderators
您的代码将变为
import praw
import os
import re
sub = "RedditsQuests"
client_id = os.environ.get("client_id")
client_secret = os.environ.get("client_secret")
password = os.environ.get("pass")
reddit = praw.Reddit(
client_id=client_id,
client_secret=client_secret,
password=password,
user_agent="r/RedditsQuests bot",
username="TheQuestMaster",
)
moderators = list(reddit.subreddit(sub).moderator())
for submission in reddit.subreddit(sub).new(limit=None):
submission.comments.replace_more(limit=None)
if submission.saved is False:
for comment in submission.comments.list():
if (
(("!completed" in comment.body))
and ((comment.is_submitter) or (comment.author in moderators))
and (comment.parent().author.name is not submission.author.name)
):
print("etc...")
在我的简短测试中,这个脚本的运行速度快了很多倍,因为我们只获得一次版主列表,而不是获取所有评论用户主持的所有子版块。
作为不相关的样式说明,您应该使用 if not submission.saved
而不是 if submission.saved is False
,这是检查条件是否为假的常规方法。
F.W. This isn't just a PRAW question, it leans toward Python more than PRAW. Python people are welcome to contribute, and please note this is not my mother language xD!
基本上,我正在使用执行以下操作的 PRAW 编写 Reddit 机器人:
- 遍历 "unsaved" 个帖子
- 遍历所述帖子的评论(针对子评论)
- 如果评论包含“!completed”,则为提交者所写或为版主,且父评论不是提交者所写:
- 做等,e.x。打印("Hey") 不,我没有解释得太清楚。例子更好,所以这里xD:
用例:
- Post by @dudeOne
- Comment by @dudeTwo
- Comment with "!completed" by @dudeOne
- Post by @dudeOne
- Comment by @dudeTwo
- Comment with "!completed" by @moderatorOne
打印("Hey"),以及:
- Post by @dudeOne
- Comment by @dudeOne
- Comment with "!completed" by @dudeOne
... 什么都不做,甚至可能会删除 + 消息@dudeOne。
这是我的乱码 (xD):
import praw
import os
import re
sub = "RedditsQuests"
client_id = os.environ.get('client_id')
client_secret = os.environ.get('client_secret')
password = os.environ.get('pass')
reddit = praw.Reddit(client_id=client_id,
client_secret=client_secret,
password=password,
user_agent='r/RedditsQuests bot',
username='TheQuestMaster')
for submission in reddit.subreddit(sub).new(limit=None):
submission.comments.replace_more(limit=None)
if submission.saved is False:
for comment in submission.comments.list():
if ((("!completed" in comment.body)) and ((comment.is_submitter) or ('RedditsQuests' in comment.author.moderated())) and (comment.parent().author.name is not submission.author.name)):
print("etc...")
有一个大小适中的堆栈,所以我将其添加到 this bin 中供您参考。在我看来,PRAW 似乎正在超时,因为 if-in-for 循环花费的时间太长。不过我可能是错的!
这个问题(如您所说)有点零星,但我已经缩小了范围。事实证明,尝试获取由 /u/AutoModerator 主持的 subreddits 有时会超时(大概是因为列表很长)。
找出问题
以下是我发现问题的方式。如果您只对解决方案感兴趣,请跳过此部分。
首先,我修改了您的脚本以使用 try
和 except
来捕获发生的异常。你的回溯告诉我,它发生在以 if ((("!completed" in comment.body))
开头的行上,特别是在获取用户审核的 subreddits 时。这是我修改后的脚本:
for submission in reddit.subreddit(sub).new(limit=None):
submission.comments.replace_more(limit=None)
if submission.saved is False:
for comment in submission.comments.list():
try:
if (
(("!completed" in comment.body))
and (
(comment.is_submitter)
or ("RedditsQuests" in comment.author.moderated())
)
and (comment.parent().author.name is not submission.author.name)
):
print("etc...")
except Exception:
print(f'Author: {comment.author} ({type(comment.author)})')
并且输出:
etc...
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
etc...
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
etc...
etc...
etc...
etc...
etc...
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
Author: AutoModerator (<class 'praw.models.reddit.redditor.Redditor'>)
etc...
etc...
考虑到这一点,我编写了一个非常简单的 3 行脚本来重现该问题:
import praw
reddit = praw.Reddit(...)
print(reddit.redditor("AutoModerator").moderated())
有时此脚本会成功,但有时会因相同的套接字读取超时而失败。据推测,超时的发生是因为 AutoModerator 审核了太多的 subreddits(至少 10,000),并且 Reddit API 处理请求的时间太长。
解决问题
您的脚本试图确定有问题的 redditor 是否是 subreddit 的版主。您通过检查 subreddit 是否在用户的审核 subreddits 列表中来执行此操作,但您可以将其切换为检查用户是否在 subreddit 的审核员列表中。这不仅不会超时,而且您将节省大量网络请求,因为您只需获取一次版主列表。
The PRAW documentation of Subreddit
展示了我们如何获得 subreddit 版主列表。对于您的情况,我们可以做到
moderators = list(reddit.subreddit(sub).moderator())
然后,我们不检查 "RedditsQuests" in comment.author.moderated()
,而是检查
comment.author in moderators
您的代码将变为
import praw
import os
import re
sub = "RedditsQuests"
client_id = os.environ.get("client_id")
client_secret = os.environ.get("client_secret")
password = os.environ.get("pass")
reddit = praw.Reddit(
client_id=client_id,
client_secret=client_secret,
password=password,
user_agent="r/RedditsQuests bot",
username="TheQuestMaster",
)
moderators = list(reddit.subreddit(sub).moderator())
for submission in reddit.subreddit(sub).new(limit=None):
submission.comments.replace_more(limit=None)
if submission.saved is False:
for comment in submission.comments.list():
if (
(("!completed" in comment.body))
and ((comment.is_submitter) or (comment.author in moderators))
and (comment.parent().author.name is not submission.author.name)
):
print("etc...")
在我的简短测试中,这个脚本的运行速度快了很多倍,因为我们只获得一次版主列表,而不是获取所有评论用户主持的所有子版块。
作为不相关的样式说明,您应该使用 if not submission.saved
而不是 if submission.saved is False
,这是检查条件是否为假的常规方法。