Reddit Scraper 和 Telegram 机器人
Reddit Scraper and Telegram Bot
我想从“Science”subreddit 抓取 一些科学新闻,并通过电报机器人将其广播到我的电报频道。我在 Python 中为每个任务构建了这两个简单的代码片段。现在我想知道什么是将它们组合在一个可靠的代码块中的最佳方法,以便机器人可以在每次程序运行时自动将已被 抓取 的信息发送到频道执行。这两个脚本单独工作得很好。请指教
Reddit 爬虫
import praw
# assigning Reddit API data
# see further instructions here --> https://www.reddit.com/prefs/apps
reddit = praw.Reddit(client_id='XXXX', \
client_secret='XXXXXXXXXXXXXXXXXXXXXXX', \
user_agent='science_bot', \
username='XXXXXX', \
password='XXXXXXXXXXXXXXXXXX')
# select a subreddit you want to use for scraping data
subreddit = reddit.subreddit('science')
new_subreddit = subreddit.new(limit=500)
print("\t", "Digest of the latest scientific news for today: \n")
for submission in subreddit.new(limit=5):
print(submission.title)
print(submission.url, "\n")
发布电报机器人
import requests
def telegram_bot_sendtext(bot_message):
bot_token = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
bot_chatID = '@XXXXXX'
send_text = 'https://api.telegram.org/bot' + bot_token + '/sendMessage?chat_id=' + bot_chatID + '&parse_mode=Markdown&text=' + bot_message
response = requests.get(send_text)
return response.json()
test = telegram_bot_sendtext("Testing my new Telegram bot.")
print(test)
提前致谢!
我已经使用以下代码结构解决了这个问题。感谢@maxwell 的简单而优雅的想法。
import telegram
import telebot
import praw
bot_token = 'XXXXXXXXXXXXXXXXXXXXXXXXX'
bot_chatID = '@your_channel_name'
bot = telebot.TeleBot('XXXXXXXXXXXXXXXXXXXXXXXXX')
reddit = praw.Reddit(client_id='XXXXXXXXXXXXXX', \
client_secret='XXXXXXXXXXXXXXXXXXXXXXXX', \
user_agent='your_bot_name', \
username='your_reddit_username', \
password='XXXXXXXXXXXXXX')
def reddit_scraper(submission):
news_data = []
subreddit = reddit.subreddit('name_of_subreddit')
new_subreddit = subreddit.new(limit=500)
for submission in subreddit.new(limit=5):
data = {}
data['title'] = submission.title
data['link'] = submission.url
news_data.append(data)
return news_data
def get_msg(news_data):
msg = '\n\n\n'
for news_item in news_data:
title = news_item['title']
link = news_item['link']
msg += title+'\n[<a href="'+link+'">Read the full article --></a>]'
msg += '\n\n'
return msg
subreddit = reddit.subreddit('name_of_subreddit')
new_subreddit = subreddit.new(limit=500)
for submission in subreddit.new(limit=1):
news_data = reddit_scraper(submission)
if len(news_data) > 0:
msg = get_msg(news_data)
status = bot.send_message(chat_id='@your_channel_name', text=msg, parse_mode=telegram.ParseMode.HTML)
if status:
print(status)
else:
print('No updates.')
我想从“Science”subreddit 抓取 一些科学新闻,并通过电报机器人将其广播到我的电报频道。我在 Python 中为每个任务构建了这两个简单的代码片段。现在我想知道什么是将它们组合在一个可靠的代码块中的最佳方法,以便机器人可以在每次程序运行时自动将已被 抓取 的信息发送到频道执行。这两个脚本单独工作得很好。请指教
Reddit 爬虫
import praw
# assigning Reddit API data
# see further instructions here --> https://www.reddit.com/prefs/apps
reddit = praw.Reddit(client_id='XXXX', \
client_secret='XXXXXXXXXXXXXXXXXXXXXXX', \
user_agent='science_bot', \
username='XXXXXX', \
password='XXXXXXXXXXXXXXXXXX')
# select a subreddit you want to use for scraping data
subreddit = reddit.subreddit('science')
new_subreddit = subreddit.new(limit=500)
print("\t", "Digest of the latest scientific news for today: \n")
for submission in subreddit.new(limit=5):
print(submission.title)
print(submission.url, "\n")
发布电报机器人
import requests
def telegram_bot_sendtext(bot_message):
bot_token = 'XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX'
bot_chatID = '@XXXXXX'
send_text = 'https://api.telegram.org/bot' + bot_token + '/sendMessage?chat_id=' + bot_chatID + '&parse_mode=Markdown&text=' + bot_message
response = requests.get(send_text)
return response.json()
test = telegram_bot_sendtext("Testing my new Telegram bot.")
print(test)
提前致谢!
我已经使用以下代码结构解决了这个问题。感谢@maxwell 的简单而优雅的想法。
import telegram
import telebot
import praw
bot_token = 'XXXXXXXXXXXXXXXXXXXXXXXXX'
bot_chatID = '@your_channel_name'
bot = telebot.TeleBot('XXXXXXXXXXXXXXXXXXXXXXXXX')
reddit = praw.Reddit(client_id='XXXXXXXXXXXXXX', \
client_secret='XXXXXXXXXXXXXXXXXXXXXXXX', \
user_agent='your_bot_name', \
username='your_reddit_username', \
password='XXXXXXXXXXXXXX')
def reddit_scraper(submission):
news_data = []
subreddit = reddit.subreddit('name_of_subreddit')
new_subreddit = subreddit.new(limit=500)
for submission in subreddit.new(limit=5):
data = {}
data['title'] = submission.title
data['link'] = submission.url
news_data.append(data)
return news_data
def get_msg(news_data):
msg = '\n\n\n'
for news_item in news_data:
title = news_item['title']
link = news_item['link']
msg += title+'\n[<a href="'+link+'">Read the full article --></a>]'
msg += '\n\n'
return msg
subreddit = reddit.subreddit('name_of_subreddit')
new_subreddit = subreddit.new(limit=500)
for submission in subreddit.new(limit=1):
news_data = reddit_scraper(submission)
if len(news_data) > 0:
msg = get_msg(news_data)
status = bot.send_message(chat_id='@your_channel_name', text=msg, parse_mode=telegram.ParseMode.HTML)
if status:
print(status)
else:
print('No updates.')