Python 抓取 (Beautiful Soup) 以从中获取数据 HTML
Python scraping (Beautiful Soup) to obtain data from this HTML
<ul>
<li>
<div class="c_logo_box">
<a href="money-transfer-companies/ria-money-transfer/"><img src="http://www.compareremit.com/uploads/ria-logo11.png" style="height:57px;width:147px;" alt="RIA Money Transfer"></a>
<span class="rs"> <span class="txt13">₹</span> 61.24</span>
</div>
</li>
...
我希望从 'alt = Ria Money Transfer' 中删除 name 并从 span 61.24 中删除 rate。
到目前为止我有这个 Python 代码:
#!/usr/bin/python
import requests
import re
from bs4 import BeautifulSoup
r = requests.get('http://www.compareremit.com')
data = r.text
soup = BeautifulSoup(data)
for rate in soup.find_all('li', re.compile('money')):
print rate.text
它什么也没给我。有人能告诉我我错过了什么吗?另外,我无法想象我支持在 for 循环搜索中查找哪个元素,您能否大致说明一下在这种情况下如何知道在 for 循环中指定什么作为条件?
有多种方法可以到达该元素。一种选择是依赖 a
标签,其中 href
包含 ria-money-transfer
部分,然后得到 following span
element 包含率:
import re
from bs4 import BeautifulSoup
import requests
response = requests.get('http://www.compareremit.com')
soup = BeautifulSoup(response.content)
link = soup.find('div', class_='c_logo_box').find('a', href=re.compile(r'ria-money-transfer'))
print(link.img.get('alt'))
rate = link.find_next_sibling('span').text.split(' ')[-1]
print(rate)
打印:
RIA Money Transfer
61.24
您的代码在逻辑上不正确。您可以通过多种方式执行此操作,请尝试此代码
#!/usr/bin/python
import requests
import re
from bs4 import BeautifulSoup
r = requests.get('http://www.compareremit.com')
data = r.text
soup = BeautifulSoup(data)
for rate in soup.find_all('div',{"class":"c_logo_box"}):
print rate.a.img['alt']
print rate.span.text
<ul>
<li>
<div class="c_logo_box">
<a href="money-transfer-companies/ria-money-transfer/"><img src="http://www.compareremit.com/uploads/ria-logo11.png" style="height:57px;width:147px;" alt="RIA Money Transfer"></a>
<span class="rs"> <span class="txt13">₹</span> 61.24</span>
</div>
</li>
...
我希望从 'alt = Ria Money Transfer' 中删除 name 并从 span 61.24 中删除 rate。
到目前为止我有这个 Python 代码:
#!/usr/bin/python
import requests
import re
from bs4 import BeautifulSoup
r = requests.get('http://www.compareremit.com')
data = r.text
soup = BeautifulSoup(data)
for rate in soup.find_all('li', re.compile('money')):
print rate.text
它什么也没给我。有人能告诉我我错过了什么吗?另外,我无法想象我支持在 for 循环搜索中查找哪个元素,您能否大致说明一下在这种情况下如何知道在 for 循环中指定什么作为条件?
有多种方法可以到达该元素。一种选择是依赖 a
标签,其中 href
包含 ria-money-transfer
部分,然后得到 following span
element 包含率:
import re
from bs4 import BeautifulSoup
import requests
response = requests.get('http://www.compareremit.com')
soup = BeautifulSoup(response.content)
link = soup.find('div', class_='c_logo_box').find('a', href=re.compile(r'ria-money-transfer'))
print(link.img.get('alt'))
rate = link.find_next_sibling('span').text.split(' ')[-1]
print(rate)
打印:
RIA Money Transfer
61.24
您的代码在逻辑上不正确。您可以通过多种方式执行此操作,请尝试此代码
#!/usr/bin/python
import requests
import re
from bs4 import BeautifulSoup
r = requests.get('http://www.compareremit.com')
data = r.text
soup = BeautifulSoup(data)
for rate in soup.find_all('div',{"class":"c_logo_box"}):
print rate.a.img['alt']
print rate.span.text