从 Reddit 拉取头条新闻
Pull headlines from Reddit
抢占Reddit首页头条的最佳方式是什么?目前我正在使用 BeautifulSoup4
来尝试抓取它们,但使用 Reddit API 似乎是一个可行的选择,但我在他们的文档中找不到任何地方 URL 可以点击请求头条新闻。 http://www.reddit.com/r/frontpage/top.json?limit=10
之类的东西是我的猜测,但这不会在 frontpage.
上产生任何头条新闻
Python 爬虫方法:(不工作)
def scrape(url):
try:
req = urllib2.Request(url)
conn = urllib2.urlopen(req)
content = conn.read()
soup = BeautifulSoup(content)
for link in soup.find_all('a'):
print link
except urllib2.URLError, e:
print 'Your HTTP error response code is: ', e
有什么建议吗?
在@jonrsharpe 的评论之后,有一个 python Reddit API 客户:
使用 get_top()
获取头条新闻:
>>> import praw
>>> r = praw.Reddit(user_agent='my_cool_application')
>>> for item in r.get_top():
... print item
...
4901 :: I made a Redundant Clock.
4764 :: Elon Musk plans to launch 4,000 satellites to deliver high-speed Inte...
5144 :: Pipeline breach spills up 50,000 gallons of oil into the Yellowstone ...
4603 :: Avalanche Dog In Training
4564 :: TIL it is illegal in many countries to perform surgical procedures on...
...
抢占Reddit首页头条的最佳方式是什么?目前我正在使用 BeautifulSoup4
来尝试抓取它们,但使用 Reddit API 似乎是一个可行的选择,但我在他们的文档中找不到任何地方 URL 可以点击请求头条新闻。 http://www.reddit.com/r/frontpage/top.json?limit=10
之类的东西是我的猜测,但这不会在 frontpage.
Python 爬虫方法:(不工作)
def scrape(url):
try:
req = urllib2.Request(url)
conn = urllib2.urlopen(req)
content = conn.read()
soup = BeautifulSoup(content)
for link in soup.find_all('a'):
print link
except urllib2.URLError, e:
print 'Your HTTP error response code is: ', e
有什么建议吗?
在@jonrsharpe 的评论之后,有一个 python Reddit API 客户:
使用 get_top()
获取头条新闻:
>>> import praw
>>> r = praw.Reddit(user_agent='my_cool_application')
>>> for item in r.get_top():
... print item
...
4901 :: I made a Redundant Clock.
4764 :: Elon Musk plans to launch 4,000 satellites to deliver high-speed Inte...
5144 :: Pipeline breach spills up 50,000 gallons of oil into the Yellowstone ...
4603 :: Avalanche Dog In Training
4564 :: TIL it is illegal in many countries to perform surgical procedures on...
...