使用 BeautifulSoup4 从网页获取文本时出现 "None" 和 'NoneType object...' 错误
Getting "None" and 'NoneType object...' error when using BeautifulSoup4 to get Text from a webpage
我正在尝试从 BBC 体育页面中提取主要标题(当前:"Wenger predicts 'active' January")。 ID 是 'lead-caption',位于 <h2>
和 <a>
标签中。我正在使用 Python.
from bs4 import BeautifulSoup
import urllib2
url = urllib2.urlopen("http://www.bbc.co.uk/sport/football/teams/arsenal")
soup=BeautifulSoup(url.read())
#Things I've tried
headline=soup.find('a', attrs={'id': 'lead-caption'})
print headline
#The above prints 'None'
headline1=soup.find('lead-caption').getText()
print headline1
#The above print "'NoneTpye' Object has no attirbute 'getText'
tag = soup.a
tag ['id'] = 'lead-caption'
type(tag)
print tag.string
#Error: NoneType object does not support item assignment
如有任何帮助,我们将不胜感激。谢谢:)
你的代码几乎是正确的,你正在寻找错误的元素,这就是你得到 None
的原因,它应该是 div
:
headline=soup.find('div', attrs={'id': 'lead-caption'})
headline_text=headline.find('a').getText()
print headline_text
输出:
Wenger predicts 'active' January
我正在尝试从 BBC 体育页面中提取主要标题(当前:"Wenger predicts 'active' January")。 ID 是 'lead-caption',位于 <h2>
和 <a>
标签中。我正在使用 Python.
from bs4 import BeautifulSoup
import urllib2
url = urllib2.urlopen("http://www.bbc.co.uk/sport/football/teams/arsenal")
soup=BeautifulSoup(url.read())
#Things I've tried
headline=soup.find('a', attrs={'id': 'lead-caption'})
print headline
#The above prints 'None'
headline1=soup.find('lead-caption').getText()
print headline1
#The above print "'NoneTpye' Object has no attirbute 'getText'
tag = soup.a
tag ['id'] = 'lead-caption'
type(tag)
print tag.string
#Error: NoneType object does not support item assignment
如有任何帮助,我们将不胜感激。谢谢:)
你的代码几乎是正确的,你正在寻找错误的元素,这就是你得到 None
的原因,它应该是 div
:
headline=soup.find('div', attrs={'id': 'lead-caption'})
headline_text=headline.find('a').getText()
print headline_text
输出:
Wenger predicts 'active' January