选择标签内的标签值
selecting values of tags within tags
这是我感兴趣的 html 代码的一部分:
<div class="mreinfwpr" id="mhd">
<p class="mreinfp">Hours of Operation <a href="javascript:void(0);" class="" id="vhall" onclick="houroperate('all')">(View all)</a><a href="javascript:void(0);" class="dn" id="swless" onclick="houroperate('less')">(Show less)</a></p>
<ul id="hroprt" class="alstdul">
<li class="mreinfli">
<span class="mreinflispn1">Today</span><span class="mreinflispn2"><span>11:30 am - 11:30 pm</span>
</span><span class="mreinflispn3">Closed Now</span> </li>
</ul>
<!-- View All Work Timings Vertically -->
<ul class="alstdul dn" id="statHr">
<li class="mreinfli">
<span class="mreinflispn1"> Monday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
<li class="mreinfli">
<span class="mreinflispn1"> Tuesday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
<li class="mreinfli">
<span class="mreinflispn1"> Wednesday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
<li class="mreinfli">
<span class="mreinflispn1"> Thursday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
<li class="mreinfli">
<span class="mreinflispn1"> Friday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
<li class="mreinfli">
<span class="mreinflispn1"> Saturday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
<li class="mreinfli">
<span class="mreinflispn1"> Sunday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
</ul>
</div>
<div class="mreinfwpr">
<p class="mreinfp">Also Listed in</p>
<ul class="alstdul">
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Pubs-<near>-Indira-Nagar-2nd-Stage/ct-1000027567" title="Pubs in Indira-Nagar-2nd-Stage, Bangalore">Pubs</a>
<!-- <li class="spc"></li> -->
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Pizza-Outlets-<near>-Indira-Nagar-2nd-Stage/ct-50105" title="Pizza Outlets in Indira-Nagar-2nd-Stage, Bangalore">Pizza Outlets</a>
<!-- <li class="spc"></li> -->
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Restaurants-<near>-Indira-Nagar-2nd-Stage/ct-304085" title="Restaurants in Indira-Nagar-2nd-Stage, Bangalore">Restaurants</a>
<!-- <li class="spc"></li> -->
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Lounge-Bars-<near>-Indira-Nagar-2nd-Stage/ct-597637" title="Lounge Bars in Indira-Nagar-2nd-Stage, Bangalore">Lounge Bars</a>
<!-- <li class="spc"></li> -->
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Microbrewery-Pubs-<near>-Indira-Nagar-2nd-Stage/ct-1041785821" title="Microbrewery Pubs in Indira-Nagar-2nd-Stage, Bangalore">Microbrewery Pubs</a>
<!-- <li class="spc"></li> -->
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Nightlife-Restaurants-<near>-Indira-Nagar-2nd-Stage/ct-1041746883" title="Nightlife Restaurants in Indira-Nagar-2nd-Stage, Bangalore">Nightlife Restaurants</a>
<!-- <li class="spc"></li> -->
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Foodie-Delight-<near>-Indira-Nagar-2nd-Stage/ct-1041818989" title="Foodie Delight in Indira-Nagar-2nd-Stage, Bangalore">Foodie Delight</a>
<!-- <li class="spc"></li> -->
<!-- <li class="spc"></li> -->
<!-- <li class="spc"></li> -->
<!-- <li class="spc"></li> -->
<li>
<a href="javascript:void(0);" onclick="_ct('morlstdin', 'dtpg');
openDiv('alsp');">more...</a>
</li>
</ul>
</div>
<div class="mreinfwpr">
<p class="mreinfp">Services</p>
<span class="srihd">General</span>
<ul class="alstdul">
<!-- <tr > -->
<li><img class="srimg" src="http://www.justdial.com/public/images/icon/bar.png" width="20" height="20" /><span class="sritxt">Bar </span></li>
<!-- <td class="spc"></td> -->
<li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">Outdoor Seating </span></li>
<!-- </tr> -->
<!-- <tr > -->
<li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">Alcohol </span></li>
<!-- <td class="spc"></td> -->
<li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">AC </span></li>
<!-- </tr> -->
<!-- <tr class="reset" > -->
<li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">WiFi </span></li>
<!-- <td class="spc"></td> -->
<li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">Dinein </span></li>
<!-- </tr> -->
</ul>
</div>
<div class="mreinfwpr">
<p class="mreinfp">Modes of Payment</p>
<ul class="alstdul">
<li>Cash</td>
<!-- <td class="spc"></td> -->
<li>Master Card</td>
</li>
<li>Visa Card</td>
<!-- <td class="spc"></td> -->
<li>Debit Cards</td>
</li>
<li>Credit Card</td>
<!-- <td class="spc"></td> -->
</div>
<div class="mreinfwpr">
<p class="mreinfp">Year Established</p>
<ul class="alstdul">
<li> 2010</li>
</ul>
</div>
我想要 "also listed in" 类别中的数据。即:
Also Listed in
Pubs
Pizza Outlets
Restaurants
Lounge Bars
Microbrewery Pubs
Nightlife Restaurants
Foodie Delight
more...
我试过使用:
also_listed_in=bSoup.findAll("a", { "onclick" : "_ct('alsocat', 'dtpg', '17592186044416');" })
我能够获得所需的数据。但问题是 "a" 标签中的属性,即 onclick = _ct('alsocat', 'dtpg', '17592186044416')
不断变化,不同的 url 同类。但我观察到 _ct('alsocat', 'dtpg' 的一部分 _ct('alsocat', 'dtpg', '17592186044416') 对于所有类似的 url 都是相同的。
请帮我获取需要的数据。
你可以使用onclick文本中没有变化的部分来得到你想要的:
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(html)
print(soup.find_all("a",onclick=re.compile(r"_ct\('alsocat', 'dtpg'")))
如果 _ct('alsocat'
对于这些 url 是唯一的,那么您可以只使用 css startswith 选择器:
print(soup.select("a[onclick^=_ct('alsocat']"))
这是我感兴趣的 html 代码的一部分:
<div class="mreinfwpr" id="mhd">
<p class="mreinfp">Hours of Operation <a href="javascript:void(0);" class="" id="vhall" onclick="houroperate('all')">(View all)</a><a href="javascript:void(0);" class="dn" id="swless" onclick="houroperate('less')">(Show less)</a></p>
<ul id="hroprt" class="alstdul">
<li class="mreinfli">
<span class="mreinflispn1">Today</span><span class="mreinflispn2"><span>11:30 am - 11:30 pm</span>
</span><span class="mreinflispn3">Closed Now</span> </li>
</ul>
<!-- View All Work Timings Vertically -->
<ul class="alstdul dn" id="statHr">
<li class="mreinfli">
<span class="mreinflispn1"> Monday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
<li class="mreinfli">
<span class="mreinflispn1"> Tuesday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
<li class="mreinfli">
<span class="mreinflispn1"> Wednesday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
<li class="mreinfli">
<span class="mreinflispn1"> Thursday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
<li class="mreinfli">
<span class="mreinflispn1"> Friday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
<li class="mreinfli">
<span class="mreinflispn1"> Saturday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
<li class="mreinfli">
<span class="mreinflispn1"> Sunday </span><span class="mreinflispn2">11:30 am - 11:30 pm</span>
</li>
</ul>
</div>
<div class="mreinfwpr">
<p class="mreinfp">Also Listed in</p>
<ul class="alstdul">
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Pubs-<near>-Indira-Nagar-2nd-Stage/ct-1000027567" title="Pubs in Indira-Nagar-2nd-Stage, Bangalore">Pubs</a>
<!-- <li class="spc"></li> -->
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Pizza-Outlets-<near>-Indira-Nagar-2nd-Stage/ct-50105" title="Pizza Outlets in Indira-Nagar-2nd-Stage, Bangalore">Pizza Outlets</a>
<!-- <li class="spc"></li> -->
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Restaurants-<near>-Indira-Nagar-2nd-Stage/ct-304085" title="Restaurants in Indira-Nagar-2nd-Stage, Bangalore">Restaurants</a>
<!-- <li class="spc"></li> -->
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Lounge-Bars-<near>-Indira-Nagar-2nd-Stage/ct-597637" title="Lounge Bars in Indira-Nagar-2nd-Stage, Bangalore">Lounge Bars</a>
<!-- <li class="spc"></li> -->
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Microbrewery-Pubs-<near>-Indira-Nagar-2nd-Stage/ct-1041785821" title="Microbrewery Pubs in Indira-Nagar-2nd-Stage, Bangalore">Microbrewery Pubs</a>
<!-- <li class="spc"></li> -->
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Nightlife-Restaurants-<near>-Indira-Nagar-2nd-Stage/ct-1041746883" title="Nightlife Restaurants in Indira-Nagar-2nd-Stage, Bangalore">Nightlife Restaurants</a>
<!-- <li class="spc"></li> -->
<li>
<a onclick="_ct('alsocat', 'dtpg', '17592186044416');" href="http://www.justdial.com/Bangalore/Foodie-Delight-<near>-Indira-Nagar-2nd-Stage/ct-1041818989" title="Foodie Delight in Indira-Nagar-2nd-Stage, Bangalore">Foodie Delight</a>
<!-- <li class="spc"></li> -->
<!-- <li class="spc"></li> -->
<!-- <li class="spc"></li> -->
<!-- <li class="spc"></li> -->
<li>
<a href="javascript:void(0);" onclick="_ct('morlstdin', 'dtpg');
openDiv('alsp');">more...</a>
</li>
</ul>
</div>
<div class="mreinfwpr">
<p class="mreinfp">Services</p>
<span class="srihd">General</span>
<ul class="alstdul">
<!-- <tr > -->
<li><img class="srimg" src="http://www.justdial.com/public/images/icon/bar.png" width="20" height="20" /><span class="sritxt">Bar </span></li>
<!-- <td class="spc"></td> -->
<li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">Outdoor Seating </span></li>
<!-- </tr> -->
<!-- <tr > -->
<li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">Alcohol </span></li>
<!-- <td class="spc"></td> -->
<li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">AC </span></li>
<!-- </tr> -->
<!-- <tr class="reset" > -->
<li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">WiFi </span></li>
<!-- <td class="spc"></td> -->
<li><img class="srimg" src="http://www.justdial.com/public/images/icon/checkmarkNew.png" width="20" height="20" /><span class="sritxt">Dinein </span></li>
<!-- </tr> -->
</ul>
</div>
<div class="mreinfwpr">
<p class="mreinfp">Modes of Payment</p>
<ul class="alstdul">
<li>Cash</td>
<!-- <td class="spc"></td> -->
<li>Master Card</td>
</li>
<li>Visa Card</td>
<!-- <td class="spc"></td> -->
<li>Debit Cards</td>
</li>
<li>Credit Card</td>
<!-- <td class="spc"></td> -->
</div>
<div class="mreinfwpr">
<p class="mreinfp">Year Established</p>
<ul class="alstdul">
<li> 2010</li>
</ul>
</div>
我想要 "also listed in" 类别中的数据。即:
Also Listed in
Pubs
Pizza Outlets
Restaurants
Lounge Bars
Microbrewery Pubs
Nightlife Restaurants
Foodie Delight
more...
我试过使用:
also_listed_in=bSoup.findAll("a", { "onclick" : "_ct('alsocat', 'dtpg', '17592186044416');" })
我能够获得所需的数据。但问题是 "a" 标签中的属性,即 onclick = _ct('alsocat', 'dtpg', '17592186044416') 不断变化,不同的 url 同类。但我观察到 _ct('alsocat', 'dtpg' 的一部分 _ct('alsocat', 'dtpg', '17592186044416') 对于所有类似的 url 都是相同的。 请帮我获取需要的数据。
你可以使用onclick文本中没有变化的部分来得到你想要的:
from bs4 import BeautifulSoup
import re
soup = BeautifulSoup(html)
print(soup.find_all("a",onclick=re.compile(r"_ct\('alsocat', 'dtpg'")))
如果 _ct('alsocat'
对于这些 url 是唯一的,那么您可以只使用 css startswith 选择器:
print(soup.select("a[onclick^=_ct('alsocat']"))