从 next_sibling - BeautifulSoup 4 中获取文本
Get the Text from the next_sibling - BeautifulSoup 4
我想从 this URL
抓取 Restaurants
for rests in dining_soup.select("div.infos-restos"):
for rest in rests.select("h3"):
safe_print(" Rest Nsme: "+rest.text)
print(rest.next_sibling.next_sibling.next_sibling.next_sibling.contents)
产出
<div class="descriptif-resto">
<p>
<strong>Type of cuisine</strong>:International</p>
<p>
<strong>Opening hours</strong>:06:00-23:30</p>
<p>The Food Square bar and restaurant offers a varied menu in an elegant and welcoming setting. In fine weather you can also enjoy your meal next to the pool or relax on the garden terrace.</p>
</div>
和
print(rest.next_sibling.next_sibling.next_sibling.next_sibling.text)
输出始终为空
所以我的问题是如何从 Div 中提取 Type of cuisine
和 opening hours
?
营业时间和美食在"descriptif-resto"
文字中:
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.accorhotels.com/gb/hotel-5548-mercure-niederbronn-hotel/restaurant.shtml")
soup = BeautifulSoup(r.content)
print(soup.find("div",attrs={"class":"descriptif-resto"}).text)
Type of cuisine:Brasserie
Opening hours:12:00 - 14:00 / 19:00 - 22:00
名字在第一个h3标签里,类型和营业时间在两个p标签里:
name = soup.find("div", attrs={"class":"infos-restos"}).h3.text
det = soup.find("div",attrs={"class":"descriptif-resto"}).p
hours = det.find_next("p").text
tpe = det.text
print(name)
print(hours)
print(tpe)
LA STUB DU CASINO
Opening hours:12:00 - 14:00 / 19:00 - 22:00
Type of cuisine:Brasserie
好的,所以有些部分没有开放时间和美食,因此您必须对其进行微调,但这会获取所有信息:
from itertools import chain
all_dets = soup.find_all("div", attrs={"class":"infos-restos"})
# get all names from h3 tagsusing chain so we can zip later
names = chain.from_iterable(x.find_all("h3") for x in all_dets)
# get all info to extract cuisine, hours
det = chain.from_iterable(x.find_all("div",attrs={"class":"descriptif-resto"}) for x in all_dets)
# zipp appropriate details with each name
zipped = zip(names, det)
for name, det in zipped:
details = det.p
name, tpe = name.text, details
hours = details.find_next("p") if "cuisine" in det.p.text else ""
if hours: # empty string means we have a bar
print(name, tpe.text, hours.text)
else:
print(name, tpe.text)
print("-----------------------------")
LA STUB DU CASINO
Type of cuisine:Brasserie
Opening hours:12:00 - 14:00 / 19:00 - 22:00
-----------------------------
RESTAURANT DU CASINO IVORY
Type of cuisine:French
Opening hours:19:00 - 22:00
-----------------------------
BAR DE L'HOTEL LE DOLLY
Opening hours:10:00-01:00
-----------------------------
BAR DES MACHINES A SOUS
Opening hours:10:30-03:00
-----------------------------
我想从 this URL
抓取 Restaurantsfor rests in dining_soup.select("div.infos-restos"):
for rest in rests.select("h3"):
safe_print(" Rest Nsme: "+rest.text)
print(rest.next_sibling.next_sibling.next_sibling.next_sibling.contents)
产出
<div class="descriptif-resto">
<p>
<strong>Type of cuisine</strong>:International</p>
<p>
<strong>Opening hours</strong>:06:00-23:30</p>
<p>The Food Square bar and restaurant offers a varied menu in an elegant and welcoming setting. In fine weather you can also enjoy your meal next to the pool or relax on the garden terrace.</p>
</div>
和
print(rest.next_sibling.next_sibling.next_sibling.next_sibling.text)
输出始终为空
所以我的问题是如何从 Div 中提取 Type of cuisine
和 opening hours
?
营业时间和美食在"descriptif-resto"
文字中:
import requests
from bs4 import BeautifulSoup
r = requests.get("http://www.accorhotels.com/gb/hotel-5548-mercure-niederbronn-hotel/restaurant.shtml")
soup = BeautifulSoup(r.content)
print(soup.find("div",attrs={"class":"descriptif-resto"}).text)
Type of cuisine:Brasserie
Opening hours:12:00 - 14:00 / 19:00 - 22:00
名字在第一个h3标签里,类型和营业时间在两个p标签里:
name = soup.find("div", attrs={"class":"infos-restos"}).h3.text
det = soup.find("div",attrs={"class":"descriptif-resto"}).p
hours = det.find_next("p").text
tpe = det.text
print(name)
print(hours)
print(tpe)
LA STUB DU CASINO
Opening hours:12:00 - 14:00 / 19:00 - 22:00
Type of cuisine:Brasserie
好的,所以有些部分没有开放时间和美食,因此您必须对其进行微调,但这会获取所有信息:
from itertools import chain
all_dets = soup.find_all("div", attrs={"class":"infos-restos"})
# get all names from h3 tagsusing chain so we can zip later
names = chain.from_iterable(x.find_all("h3") for x in all_dets)
# get all info to extract cuisine, hours
det = chain.from_iterable(x.find_all("div",attrs={"class":"descriptif-resto"}) for x in all_dets)
# zipp appropriate details with each name
zipped = zip(names, det)
for name, det in zipped:
details = det.p
name, tpe = name.text, details
hours = details.find_next("p") if "cuisine" in det.p.text else ""
if hours: # empty string means we have a bar
print(name, tpe.text, hours.text)
else:
print(name, tpe.text)
print("-----------------------------")
LA STUB DU CASINO
Type of cuisine:Brasserie
Opening hours:12:00 - 14:00 / 19:00 - 22:00
-----------------------------
RESTAURANT DU CASINO IVORY
Type of cuisine:French
Opening hours:19:00 - 22:00
-----------------------------
BAR DE L'HOTEL LE DOLLY
Opening hours:10:00-01:00
-----------------------------
BAR DES MACHINES A SOUS
Opening hours:10:30-03:00
-----------------------------