在 Python 的 HTML 标签内查找特定文本
Find Specific Text Within HTML Tag in Python
我已经尝试了一百万种不同的方法来解析 zestimate,但尚未成功。
这是带有 zestimate 信息的 html 标签:
<span>
<span tabindex="0" role="button">
<span class="sc-bGbJRg iiEDXU ds-dashed-underline">
Zestimate
<sup>®</sup>
</span>
</span>
:
<span>1,425</span>
</span>
老实说,我认为这会让我接近,但我得到一个空列表:
link = 'https://www.zillow.com/homedetails/1404-Clearwing-Cir-Georgetown-TX-78626/121721750_zpid/'
searched_word = '<span class="sc-bGbJRg iiEDXU ds-dashed-underline">Zestimate<sup>®</sup></span>'
test_page = requests.Session().get(link, headers=req_headers)
test_soup = BeautifulSoup(test_page.content, 'lxml')
results = test_soup('span',string='searched_word')
print(results)[0]
要从站点获得正确的 HTML,请将 User-Agent
header 添加到请求中。
例如:
import requests
from bs4 import BeautifulSoup
url = 'https://www.zillow.com/homedetails/1404-Clearwing-Cir-Georgetown-TX-78626/121721750_zpid/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
home_value = soup.select_one('h4:contains("Home value")').find_next('p').get_text(strip=True)
print(home_value)
打印:
1,425
我已经尝试了一百万种不同的方法来解析 zestimate,但尚未成功。
这是带有 zestimate 信息的 html 标签:
<span>
<span tabindex="0" role="button">
<span class="sc-bGbJRg iiEDXU ds-dashed-underline">
Zestimate
<sup>®</sup>
</span>
</span>
:
<span>1,425</span>
</span>
老实说,我认为这会让我接近,但我得到一个空列表:
link = 'https://www.zillow.com/homedetails/1404-Clearwing-Cir-Georgetown-TX-78626/121721750_zpid/'
searched_word = '<span class="sc-bGbJRg iiEDXU ds-dashed-underline">Zestimate<sup>®</sup></span>'
test_page = requests.Session().get(link, headers=req_headers)
test_soup = BeautifulSoup(test_page.content, 'lxml')
results = test_soup('span',string='searched_word')
print(results)[0]
要从站点获得正确的 HTML,请将 User-Agent
header 添加到请求中。
例如:
import requests
from bs4 import BeautifulSoup
url = 'https://www.zillow.com/homedetails/1404-Clearwing-Cir-Georgetown-TX-78626/121721750_zpid/'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:77.0) Gecko/20100101 Firefox/77.0'}
soup = BeautifulSoup(requests.get(url, headers=headers).content, 'html.parser')
home_value = soup.select_one('h4:contains("Home value")').find_next('p').get_text(strip=True)
print(home_value)
打印:
1,425