beautifulsoup 未返回跨度结果

Question

我正在学习 bs4 并尝试从该网站抓取 span 标签数据并将它们放入列表中但没有返回任何结果我做错了什么？

import requests
import bs4

root_url = 'http://www.timeanddate.com'
index_url = root_url + '/astronomy/tonga/nukualofa'

response = requests.get(index_url)
soup = bs4.BeautifulSoup(response.text)
spans = soup.find_all('span', attrs={'id':'qfacts'})
for span in spans:
    print span.string

网页的所有span数据都在这个标签之间

<div class="five columns" id="qfacts">
    <p><span class="four">Current Time:</span> <span id="smct">16 Mar 2015 at
12:53:50 p.m.</span></p><br>

    <p><span class="four">Sunrise Today:</span> <span class="three">6:43
a.m.</span> <span class="comp sa8" title="Map direction East">↑</span> 93°
East<br>
    <span class="four">Sunset Today:</span> <span class="three">6:56
p.m.</span> <span class="comp sa24" title="Map direction West">↑</span>
268° West</p><br>

    <p><span class="four">Moonrise Today:</span> <span class="three">1:55
a.m.</span> <span class="comp sa10" title="Map direction East">↑</span>
108° East<br>
<span class="four">Moonset Today:</span> <span class="three">3:17
p.m.</span> <span class="comp sa22" title="Map direction West">↑</span>
253° West</p><br>

    <p><span class="four">Daylight Hours:</span> <span title=
"The current day is 12 hours, 13 minutes long which is 1m 13s shorter than yesterday.">
12 hours, 13 minutes (-1m 13s)</span></p>
</div>

Answer 1

一个微妙的错误是，您正在搜索 ID 为 "facts" 的跨度标签，而您真正想要的是在中搜索跨度标签 div 有那个 id。

替换，

spans = soup.find_all('span', attrs={'id':'qfacts'})

与

div = soup.find('div', attrs={'id': 'qfacts'})  # <-- div not span
spans = div.find_all('span')  # <-- now find the spans inside

如果您正在寻找具有一些 class 的 div，您可能想要迭代这些 div 并找到其中的所有跨度，但这是一个 id，因此只有一个 [=12] =] 调用就足够了。

beautifulsoup 未返回跨度结果

beautifulsoup not returning span results

html

python

beautifulsoup

html-parsing

python-2.7