使用 Beautiful Soup 获取 href
Getting href using Beautiful Soup
我正在尝试为此 html 代码提取特定的 link
<a class="pageNum taLnk" data-offset="10" data-page-number="1"
href="www.blahblahblah.com/bb32123">Page 1 </a>
<a class="pageNum taLnk" data-offset="20" data-page-number="2"
href="www.blahblahblah.com/bb45135">Page 2 </a>
如您所见,link (href) 杂乱无章,因此没有可供我使用的模式,这意味着我需要使用 BeautifulSoup.[=14 手动提取 href =]
我想专门获取第 2 页的 href。
这些可以是我现在的代码。
from bs4 import BeautifulSoup
import urllib
url = 'https://www.tripadvisor.com/ShowUserReviews-g293917-d539542-r447460956-Duangtawan_Hotel_Chiang_Mai-Chiang_Mai.html#REVIEWS'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, 'html.parser')
for link in soup.find_all('a', attrs = {'class' : 'pageNum taLnk'}):
print (link)
如您所见,我一直在尝试获取专门针对第 2 页的 href 信息。无论如何,是否可以使用 data-page-number = "2"
或 [=13= 等标签中的额外信息来访问].
page_2 = soup.find('a', attrs = {'data-page-number' : '2'})
这只会给你第 2 页,如果你想得到下一页,不管当前页是什么,你应该找到下一页 url:
next_page = soup.find('a', attrs = {'class' = 'nav next rndBtn ui_button primary taLnk'})
Some attributes, like the data-* attributes in HTML 5, have names that
can’t be used as the names of keyword arguments:
data_soup = BeautifulSoup('<div data-foo="value">foo!</div>')
data_soup.find_all(data-foo="value")
# SyntaxError: keyword can't be an expression
You can use these attributes in searches by putting them into a
dictionary and passing the dictionary into find_all() as the attrs
argument:
data_soup.find_all(attrs={"data-foo": "value"})
# [<div data-foo="value">foo!</div>]
我正在尝试为此 html 代码提取特定的 link
<a class="pageNum taLnk" data-offset="10" data-page-number="1"
href="www.blahblahblah.com/bb32123">Page 1 </a>
<a class="pageNum taLnk" data-offset="20" data-page-number="2"
href="www.blahblahblah.com/bb45135">Page 2 </a>
如您所见,link (href) 杂乱无章,因此没有可供我使用的模式,这意味着我需要使用 BeautifulSoup.[=14 手动提取 href =]
我想专门获取第 2 页的 href。
这些可以是我现在的代码。
from bs4 import BeautifulSoup
import urllib
url = 'https://www.tripadvisor.com/ShowUserReviews-g293917-d539542-r447460956-Duangtawan_Hotel_Chiang_Mai-Chiang_Mai.html#REVIEWS'
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page, 'html.parser')
for link in soup.find_all('a', attrs = {'class' : 'pageNum taLnk'}):
print (link)
如您所见,我一直在尝试获取专门针对第 2 页的 href 信息。无论如何,是否可以使用 data-page-number = "2"
或 [=13= 等标签中的额外信息来访问].
page_2 = soup.find('a', attrs = {'data-page-number' : '2'})
这只会给你第 2 页,如果你想得到下一页,不管当前页是什么,你应该找到下一页 url:
next_page = soup.find('a', attrs = {'class' = 'nav next rndBtn ui_button primary taLnk'})
Some attributes, like the data-* attributes in HTML 5, have names that can’t be used as the names of keyword arguments:
data_soup = BeautifulSoup('<div data-foo="value">foo!</div>') data_soup.find_all(data-foo="value") # SyntaxError: keyword can't be an expression
You can use these attributes in searches by putting them into a dictionary and passing the dictionary into find_all() as the attrs argument:
data_soup.find_all(attrs={"data-foo": "value"}) # [<div data-foo="value">foo!</div>]