如何通过reddit bot发送抓取的数据

Question

所以我有了这个机器人，只要有人在特定的 subreddit 上说 "mets score"，我就想用它来回复大都会游戏的得分。这是我的第一个 python 项目，我计划在我作为学习工具创建的虚拟 subreddit 上使用它。我无法从我通过机器人抓取的网站发送分数，因此它可以出现在对 "mets score" 评论的回复中。有什么建议吗？

import praw
import time
from lxml import html
import requests
from bs4 import BeautifulSoup

r = praw.Reddit(user_agent = 'my_first_bot')
r.login('user_name', 'password')

def scores():
soup = BeautifulSoup(requests.get("http://scores.nbcsports.com/mlb/scoreboard.asp?day=20160621&meta=true").content, "lxml")

table = soup.find("a",class_="teamName", text="NY Mets").find_previous("table")
a, b = [a.text for a in table.find_all("a",class_="teamName")]
inn, a_score, b_score = ([td.text for td in row.select("td.shsTotD")] for row in table.find_all("tr"))
print (" ".join(inn))
print ("{}: {}".format(a, " ".join(a_score)))
print ("{}: {}".format(b, " ".join(b_score)))


words_to_match = ['mets score']
cache = []

def run_bot():
    print("Grabbing subreddit...")
    subreddit = r.get_subreddit("random_subreddit")
    print("Grabbing comments...")
    comments = subreddit.get_comments(limit=40)
    for comment in comments:
        print(comment.id)
        comment_text = comment.body.lower()
        isMatch = any(string in comment_text for string in words_to_match)
        if comment.id not in cache and isMatch:
            print("match found!"  + comment.id)
            comment.reply('heres the score to last nights mets game...' scores())
            print("reply successful")
            cache.append(comment.id)
            print("loop finished, goodnight")

while True:
    run_bot()
    time.sleep(120)

Answer 1

我想我会让你摆脱痛苦;)。您的代码段存在多个问题：

comment.reply('heres the score to last nights mets game...' scores())

.reply() 方法需要一个字符串或一个对象，该字符串或对象可以具有足够好的字符串表示形式。假设方法 scores() return 是一个字符串，您应该连接两个参数，如下所示：

comment.reply('heres the score to last nights mets game...'+ scores())

看来您对基本 python 语法和结构的了解很浅。如需快速复习，请参阅 this.

您的方法 scores() 没有 return 任何东西。它只是打印出一堆行（我假设是为了调试目的）。

def scores():
    soup = BeautifulSoup(requests.get("http://scores.nbcsports.com/mlb/scoreboard.asp?day=20160621&meta=true").content, "lxml")
    .......    
    print (" ".join(inn))
    print ("{}: {}".format(a, " ".join(a_score)))
    print ("{}: {}".format(b, " ".join(b_score)))

有趣的是，您可以使用这些确切的字符串作为您的 return 值（或者完全可以根据您的需要使用其他字符串），如下所示：

def scores():
    .......
    inn_string = " ".join(inn)
    a_string = "{}: {}".format(a, " ".join(a_score))
    b_string = "{}: {}".format(b, " ".join(b_score))
    return "\n".join([inn_string, a_string, b_string])

这些应该会让你起床运行。

更多建议：你看过Reddit PRAW docs? You should. You should also probably use praw.helpers.comment_stream(). It's simple and easy to use and will handle retrieving new comments for you. Currently you try and fetch a maximum of 40 comments every 120 seconds. What happens when there are more than that many relevant comments in that 120 second span. You'll end up missing some of the comments you should've replied to. .comment_stream() will take care of rate limiting for you so that your bot can reply to each new comment which needs its attention at its own pace. Read more about this here.

如何通过reddit bot发送抓取的数据

How to send scraped data through reddit bot

python

bots

reddit

web-scraping