您可以将发布到 "hot" 的帖子流式传输吗?
Can you stream posts that have made it to "hot"?
所以假设我想从 subreddit "news" 流式传输 posts。然而 post 非常频繁,我们不能说每个 post 都值得。所以我想通过尝试流式传输 "hot" 列表来过滤好的 posts。但我不确定是否可以这样做,或者类似的事情。
通常,这就是我对流 posts:
所做的
for submission in subreddit.stream.submissions():
if not submission.stickied:
print(str(submission.title) + " " + str(submission.url) + "\n")
这将过滤 posts,但不会流式传输:
for submission in subreddit.hot(limit=10):
print(str(submission.title) + " " + str(submission.url) + "\n")
那么,关于如何同时流式传输和过滤 post 有什么想法吗?
谢谢
流媒体热点 posts 是一个不协调的想法。
PRAW 中流的要点是在提交到 Reddit 后立即获取每个 post 或评论(几乎)。另一方面,热门列表包含被认为当前有趣的项目,按分数排序,分数与分数除以年龄成正比。
However the posts are very frequent and we can't say that every post is worthy.
因为 Reddit 用户需要时间才能看到 post 并对其进行投票,因此立即评估 post 是否值得(以分数衡量)没有多大意义在 posted.
之后
如果您的目标是对每个 post 执行一些操作使其进入 subreddit 的顶部 n,您可以查看首页一定的时间间隔,对您尚未看到的任何 post 执行您的操作。例如:
import praw
import time
reddit = praw.Reddit() # must be edited to properly authenticate
subreddit = reddit.subreddit('news')
seen_submissions = set()
while True:
for submission in subreddit.hot(limit=10):
if submission.fullname not in seen_submissions:
seen_submissions.add(submission.fullname)
print('{} {}\n'.format(submission.title, submission.url))
time.sleep(60) # sleep for a minute (60 seconds)
要添加到 jarhill0 的答案中,您还可以通过在参数中指定 "after" 来对页面进行分页。
import praw
import time
reddit = praw.Reddit() # must be edited to properly authenticate
subreddit = reddit.subreddit('news')
seen_submissions = set()
while True:
params = None
for _ in range(10):# get first 10 pages of 'hot'.
for submission in subreddit.hot(limit=10, params=params):
if submission.fullname not in seen_submissions:
seen_submissions.add(submission.fullname)
print('{} {}\n'.format(submission.title, submission.url))
params = {"after": submission.fullname}
time.sleep(60) # sleep for a minute (60 seconds)
所以假设我想从 subreddit "news" 流式传输 posts。然而 post 非常频繁,我们不能说每个 post 都值得。所以我想通过尝试流式传输 "hot" 列表来过滤好的 posts。但我不确定是否可以这样做,或者类似的事情。
通常,这就是我对流 posts:
所做的
for submission in subreddit.stream.submissions():
if not submission.stickied:
print(str(submission.title) + " " + str(submission.url) + "\n")
这将过滤 posts,但不会流式传输:
for submission in subreddit.hot(limit=10):
print(str(submission.title) + " " + str(submission.url) + "\n")
那么,关于如何同时流式传输和过滤 post 有什么想法吗?
谢谢
流媒体热点 posts 是一个不协调的想法。
PRAW 中流的要点是在提交到 Reddit 后立即获取每个 post 或评论(几乎)。另一方面,热门列表包含被认为当前有趣的项目,按分数排序,分数与分数除以年龄成正比。
However the posts are very frequent and we can't say that every post is worthy.
因为 Reddit 用户需要时间才能看到 post 并对其进行投票,因此立即评估 post 是否值得(以分数衡量)没有多大意义在 posted.
之后如果您的目标是对每个 post 执行一些操作使其进入 subreddit 的顶部 n,您可以查看首页一定的时间间隔,对您尚未看到的任何 post 执行您的操作。例如:
import praw
import time
reddit = praw.Reddit() # must be edited to properly authenticate
subreddit = reddit.subreddit('news')
seen_submissions = set()
while True:
for submission in subreddit.hot(limit=10):
if submission.fullname not in seen_submissions:
seen_submissions.add(submission.fullname)
print('{} {}\n'.format(submission.title, submission.url))
time.sleep(60) # sleep for a minute (60 seconds)
要添加到 jarhill0 的答案中,您还可以通过在参数中指定 "after" 来对页面进行分页。
import praw
import time
reddit = praw.Reddit() # must be edited to properly authenticate
subreddit = reddit.subreddit('news')
seen_submissions = set()
while True:
params = None
for _ in range(10):# get first 10 pages of 'hot'.
for submission in subreddit.hot(limit=10, params=params):
if submission.fullname not in seen_submissions:
seen_submissions.add(submission.fullname)
print('{} {}\n'.format(submission.title, submission.url))
params = {"after": submission.fullname}
time.sleep(60) # sleep for a minute (60 seconds)