使用 urllib 读取 reddit 中的信息
Readin information in reddit with urllib
我得到了以下代码:
import urllib
import re
def worldnews():
count = 0
html = urllib.urlopen("https://www.reddit.com/r/worldnews/").readlines()
lines = html
for line in lines:
if "Paris" or "Putin" in line:
count = count + 1
print line
print "Totaal gevonden: ", count
print "----------------------"
worldnews()
如何在该页面上找到标题中包含 Paris 或 Puttin 的所有 reddit post。有没有办法将 post 的标题打印到控制台?当我 运行 现在我得到了很多 html 代码。
在 Python 中使用 HTML 的最佳方式是 BeautifulSoup。因此,您需要下载它并查看文档以了解如何完全按照您的要求进行操作。但是,我让你开始了:
import urllib
from bs4 import BeautifulSoup
def worldnews():
count = 0
html = urllib.urlopen("https://www.reddit.com/r/worldnews/")
soup = BeautifulSoup(html,"lxml")
titles = soup.find_all('p',{'class':'title'})
for i in titles:
print(i.text)
worldnews()
当这是 运行 时,它给出如下所示的输出:
Paris attacks ringleader dead - French officials (bbc.com)
Company which raised price of AIDS drug by 5500% reports m quarterly losses. (pinknews.co.uk)
Syria/IraqSyrian man kills judge at ISIS Sharia Court for beheading his brother (en.abna24.com)
Putin Puts Million Bounty on Heads of Metrojet Bombers (fortune.com)
等等页面上的所有标题。从这里您应该能够稍微轻松地弄清楚如何编写其余代码。 :-)
我得到了以下代码:
import urllib
import re
def worldnews():
count = 0
html = urllib.urlopen("https://www.reddit.com/r/worldnews/").readlines()
lines = html
for line in lines:
if "Paris" or "Putin" in line:
count = count + 1
print line
print "Totaal gevonden: ", count
print "----------------------"
worldnews()
如何在该页面上找到标题中包含 Paris 或 Puttin 的所有 reddit post。有没有办法将 post 的标题打印到控制台?当我 运行 现在我得到了很多 html 代码。
在 Python 中使用 HTML 的最佳方式是 BeautifulSoup。因此,您需要下载它并查看文档以了解如何完全按照您的要求进行操作。但是,我让你开始了:
import urllib
from bs4 import BeautifulSoup
def worldnews():
count = 0
html = urllib.urlopen("https://www.reddit.com/r/worldnews/")
soup = BeautifulSoup(html,"lxml")
titles = soup.find_all('p',{'class':'title'})
for i in titles:
print(i.text)
worldnews()
当这是 运行 时,它给出如下所示的输出:
Paris attacks ringleader dead - French officials (bbc.com)
Company which raised price of AIDS drug by 5500% reports m quarterly losses. (pinknews.co.uk)
Syria/IraqSyrian man kills judge at ISIS Sharia Court for beheading his brother (en.abna24.com)
Putin Puts Million Bounty on Heads of Metrojet Bombers (fortune.com)
等等页面上的所有标题。从这里您应该能够稍微轻松地弄清楚如何编写其余代码。 :-)