如何获取 BeautifulSoup 中的文本?
How to get text in BeautifulSoup?
我正在尝试使用 BS 获取字符串 "Carlos Pellegrini 551, C1009ABK Buenos Aires, Argentina",但我遇到了问题。 (就在末尾之前)
<span class="
hp_address_subtitle
js-hp_address_subtitle
jq_tooltip
" rel="14" data-source="top_link" data-coords="," data-component="tooltip" data-tooltip-text="
<p>Ubicación <strong>excelente</strong>, ¡puntuada con 9,3/10! <small>(puntaje basado en <strong>Más recomendado</strong> comentarios)</small></p>
<p>Valorado por las personas <strong>después de hospedarse en</strong> el Hotel Panamericano Buenos Aires.</p>
" data-tooltip-animation="false" tabindex="0" data-bbox="-58.4028009366937,-34.6199039159457,-58.3590682554297,-34.5839730827995" data-node_tt_id="location_score_tooltip" data-width="350" title="" aria-describedby="tooltip-1">
Carlos Pellegrini 551, C1009ABK Buenos Aires, Argentina
</span>
我尝试了以下方法:
soup.find(attrs={'class':"hotel_address_subtitle"}).get_text()
但我得到了 None 个广告结果。
请帮帮我!
您可以这样做以仅在网站上查找文本:
soup.find_all(text=True)
而不是:
soup.find(attrs={'class':"hotel_address_subtitle"}).get_text()
尝试:
for element in soup.find_all(text=True):
print(element)
编辑 看到你的评论后,试试这个:
from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.booking.com/hotel/ar/panamericano-buenos-aires.es-ar.html")
soup = BeautifulSoup(page.content, 'html.parser')
output = soup.find("span", {"class": "hp_address_subtitle"})
print(output.text)
产出
Carlos Pellegrini 551, C1009ABK Buenos Aires, Argentina
您可以使用 "select_one"
。你可以试试:
from bs4 import BeautifulSoup
html_doc='''<span class="
hp_address_subtitle
js-hp_address_subtitle
jq_tooltip
" rel="14" data-source="top_link" data-coords="," data-component="tooltip" data-tooltip-text="
<p>Ubicación <strong>excelente</strong>, ¡puntuada con 9,3/10! <small>(puntaje basado en <strong>Más recomendado</strong> comentarios)</small></p>
<p>Valorado por las personas <strong>después de hospedarse en</strong> el Hotel Panamericano Buenos Aires.</p>
" data-tooltip-animation="false" tabindex="0" data-bbox="-58.4028009366937,-34.6199039159457,-58.3590682554297,-34.5839730827995" data-node_tt_id="location_score_tooltip" data-width="350" title="" aria-describedby="tooltip-1">
Carlos Pellegrini 551, C1009ABK Buenos Aires, Argentina
</span>'''
soup = BeautifulSoup(html_doc, 'lxml')
result = soup.select_one("span").text
print(result)
输出将是:
Carlos Pellegrini 551, C1009ABK Buenos Aires, Argentina
我正在尝试使用 BS 获取字符串 "Carlos Pellegrini 551, C1009ABK Buenos Aires, Argentina",但我遇到了问题。 (就在末尾之前)
<span class="
hp_address_subtitle
js-hp_address_subtitle
jq_tooltip
" rel="14" data-source="top_link" data-coords="," data-component="tooltip" data-tooltip-text="
<p>Ubicación <strong>excelente</strong>, ¡puntuada con 9,3/10! <small>(puntaje basado en <strong>Más recomendado</strong> comentarios)</small></p>
<p>Valorado por las personas <strong>después de hospedarse en</strong> el Hotel Panamericano Buenos Aires.</p>
" data-tooltip-animation="false" tabindex="0" data-bbox="-58.4028009366937,-34.6199039159457,-58.3590682554297,-34.5839730827995" data-node_tt_id="location_score_tooltip" data-width="350" title="" aria-describedby="tooltip-1">
Carlos Pellegrini 551, C1009ABK Buenos Aires, Argentina
</span>
我尝试了以下方法:
soup.find(attrs={'class':"hotel_address_subtitle"}).get_text()
但我得到了 None 个广告结果。
请帮帮我!
您可以这样做以仅在网站上查找文本:
soup.find_all(text=True)
而不是:
soup.find(attrs={'class':"hotel_address_subtitle"}).get_text()
尝试:
for element in soup.find_all(text=True):
print(element)
编辑 看到你的评论后,试试这个:
from bs4 import BeautifulSoup
import requests
page = requests.get("https://www.booking.com/hotel/ar/panamericano-buenos-aires.es-ar.html")
soup = BeautifulSoup(page.content, 'html.parser')
output = soup.find("span", {"class": "hp_address_subtitle"})
print(output.text)
产出
Carlos Pellegrini 551, C1009ABK Buenos Aires, Argentina
您可以使用 "select_one"
。你可以试试:
from bs4 import BeautifulSoup
html_doc='''<span class="
hp_address_subtitle
js-hp_address_subtitle
jq_tooltip
" rel="14" data-source="top_link" data-coords="," data-component="tooltip" data-tooltip-text="
<p>Ubicación <strong>excelente</strong>, ¡puntuada con 9,3/10! <small>(puntaje basado en <strong>Más recomendado</strong> comentarios)</small></p>
<p>Valorado por las personas <strong>después de hospedarse en</strong> el Hotel Panamericano Buenos Aires.</p>
" data-tooltip-animation="false" tabindex="0" data-bbox="-58.4028009366937,-34.6199039159457,-58.3590682554297,-34.5839730827995" data-node_tt_id="location_score_tooltip" data-width="350" title="" aria-describedby="tooltip-1">
Carlos Pellegrini 551, C1009ABK Buenos Aires, Argentina
</span>'''
soup = BeautifulSoup(html_doc, 'lxml')
result = soup.select_one("span").text
print(result)
输出将是:
Carlos Pellegrini 551, C1009ABK Buenos Aires, Argentina