BeautifulSoup 从查找中获取属性
BeautifulSoup get an attribute from a find
我是 python 的新手,所以首先很抱歉,我想从 beautifulsoup 中使用 find() 方法所做的选择中打印 href 内容,但我不能那样做,而且我不知道为什么。我已经这样做了
from bs4 import BeautifulSoup
from requests import session
payload = {
'btnSubmit': 'Login',
'username': 'xxx',
'password': 'xxx'
}
with session() as c:
c.post('http://www.xxx.xxx/login.php', data=payload)
request = c.get('http://www.xxx.xxx/xxxx')
soup=BeautifulSoup(request.content)
row_int=soup.find('td',attrs={'class' : 'rnr-cc rnr-bc rnr-icons'})
print row_int['href']
但我有这个错误
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
execfile ('C:\Users\Francesco\Desktop\prova.py')
File "C:\Users\Francesco\Desktop\prova.py", line 15, in <module>
print row_int['href']
File "C:\Python27\lib\site-packages\bs4\element.py", line 905, in __getitem__
return self.attrs[key]
KeyError: 'href'
row_int的内容是这样的:
[<a class="rnr-button-img" data-icon="view" href="xxxxxxx" id="viewLink12" name="viewLink12" title="Details"></a>, u' ']
我哪里错了?
您需要从 td
标签中获取链接(a
元素):
row = soup.find('td', attrs={'class': 'rnr-cc rnr-bc rnr-icons'})
a = row.find('a', href=True)
if a:
print a['href']
或者,更短的 CSS selector
:
for a in soup.select('td.rnr-cc.rnr-bc.rnr-icons a[href]'):
print a['href']
我是 python 的新手,所以首先很抱歉,我想从 beautifulsoup 中使用 find() 方法所做的选择中打印 href 内容,但我不能那样做,而且我不知道为什么。我已经这样做了
from bs4 import BeautifulSoup
from requests import session
payload = {
'btnSubmit': 'Login',
'username': 'xxx',
'password': 'xxx'
}
with session() as c:
c.post('http://www.xxx.xxx/login.php', data=payload)
request = c.get('http://www.xxx.xxx/xxxx')
soup=BeautifulSoup(request.content)
row_int=soup.find('td',attrs={'class' : 'rnr-cc rnr-bc rnr-icons'})
print row_int['href']
但我有这个错误
Traceback (most recent call last):
File "<pyshell#14>", line 1, in <module>
execfile ('C:\Users\Francesco\Desktop\prova.py')
File "C:\Users\Francesco\Desktop\prova.py", line 15, in <module>
print row_int['href']
File "C:\Python27\lib\site-packages\bs4\element.py", line 905, in __getitem__
return self.attrs[key]
KeyError: 'href'
row_int的内容是这样的:
[<a class="rnr-button-img" data-icon="view" href="xxxxxxx" id="viewLink12" name="viewLink12" title="Details"></a>, u' ']
我哪里错了?
您需要从 td
标签中获取链接(a
元素):
row = soup.find('td', attrs={'class': 'rnr-cc rnr-bc rnr-icons'})
a = row.find('a', href=True)
if a:
print a['href']
或者,更短的 CSS selector
:
for a in soup.select('td.rnr-cc.rnr-bc.rnr-icons a[href]'):
print a['href']