HTML Python 中的特定 <h1> 文本
HTML Specific <h1> Text in Python
我只想获取 python 中 page <h1>This is Title</h1>
的标题。
我尝试了一些方法,但没有得到想要的结果。
import requests
from bs4 import BeautifulSoup
response = requests.get("https://www.strawpoll.me/20321563/r")
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
for i in soup.get_text("p", {"class": "result-list"}):
print(i)
使用 lxml 完成此类任务。您也可以使用 beautifulsoup。
import lxml.html
t = lxml.html.parse(url)
print t.find(".//title").text
(来自 How can I retrieve the page title of a webpage using Python? 作者 Peter Hoffmann)
我将给定的代码添加到我的代码中。
title = soup.title
print(title.string[:-24:]) # Last 24 character of title is always constant.
如果还是得不到你想要的结果,试试这个方法。
import urllib
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.strawpoll.me/20321563/r'
uCLient = uReq(my_url)
page_html = uCLient.read()
uCLient.close()
page_soup = soup(page_html,"html.parser")
_div = page_soup.find(lambda tag: tag.name=='div' and tag.has_attr('id') and
tag['id']=="result-list")
title = _div.findAll(lambda tag: tag.name=='h1')
print(title)
输出:[<h1>This is Title</h1>]
您可以使用 BeautifulSoup,如下所示:
from bs4 import BeautifulSoup
data = "html as text(Source)"
soup = BeautifulSoup(data)
p = soup.find('h1', attrs={'class': 'titleClass'})
p.a.extract()
print p.text.strip()
我只想获取 python 中 page <h1>This is Title</h1>
的标题。
我尝试了一些方法,但没有得到想要的结果。
import requests
from bs4 import BeautifulSoup
response = requests.get("https://www.strawpoll.me/20321563/r")
html_content = response.content
soup = BeautifulSoup(html_content, "html.parser")
for i in soup.get_text("p", {"class": "result-list"}):
print(i)
使用 lxml 完成此类任务。您也可以使用 beautifulsoup。
import lxml.html
t = lxml.html.parse(url)
print t.find(".//title").text
(来自 How can I retrieve the page title of a webpage using Python? 作者 Peter Hoffmann)
我将给定的代码添加到我的代码中。
title = soup.title
print(title.string[:-24:]) # Last 24 character of title is always constant.
如果还是得不到你想要的结果,试试这个方法。
import urllib
import bs4
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
my_url = 'https://www.strawpoll.me/20321563/r'
uCLient = uReq(my_url)
page_html = uCLient.read()
uCLient.close()
page_soup = soup(page_html,"html.parser")
_div = page_soup.find(lambda tag: tag.name=='div' and tag.has_attr('id') and
tag['id']=="result-list")
title = _div.findAll(lambda tag: tag.name=='h1')
print(title)
输出:[<h1>This is Title</h1>]
您可以使用 BeautifulSoup,如下所示:
from bs4 import BeautifulSoup
data = "html as text(Source)"
soup = BeautifulSoup(data)
p = soup.find('h1', attrs={'class': 'titleClass'})
p.a.extract()
print p.text.strip()