使用 beautifulsoup 从 HTML 中提取
Extracting from HTML with beautifulsoup
我尝试从 https://www.lotto.de/de/ergebnisse/lotto-6aus49/archiv.html 中提取乐透号码(我知道有更简单的方法,但它更适合学习)。
尝试使用 Python、beautifulsoup 以下内容:
from BeautifulSoup import BeautifulSoup
import urllib2
url="https://www.lotto.de/de/ergebnisse/lotto-6aus49/archiv.html"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
numbers=soup.findAll('li',{'class':'winning_numbers.boxRow.clearfix'})
for number in numbers:
print number['li']+","+number.string
Returns 没什么,这也是我意料之中的。我阅读了教程,但仍然没有完全理解解析。有人可以给我提示吗?
谢谢!
由于数据内容是动态生成的,您可以使用 EASIER 解决方案之一 Selenium or alike to simulate the action as a browser (I use PhantomJS 作为网络驱动程序),像这样:
from selenium import webdriver
url="https://www.lotto.de/de/ergebnisse/lotto-6aus49/archiv.html"
# I'm using PhantomJS, you may use your own...
driver = webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs')
driver.get(url)
soup = BeautifulSoup(driver.page_source)
# I just simply go through the div class and grab all number texts
# without special number, like in the Sample
for ul in soup.findAll('div', {'class': 'winning_numbers'}):
n = ','.join(li for li in ul.text.split() if li.isdigit())
if n:
print 'number: {}'.format(n)
number: 6,25,26,27,28,47
也抢专号:
for ul in soup.findAll('div', {'class': 'winning_numbers'}):
# grab only numeric chars, you may apply your own logic here
n = ','.join(''.join(_ for _ in li if _.isdigit()) for li in ul.text.split())
if n:
print 'number: {}'.format(n)
number: 6,25,26,27,28,47,5 # with special number
希望对您有所帮助。
我尝试从 https://www.lotto.de/de/ergebnisse/lotto-6aus49/archiv.html 中提取乐透号码(我知道有更简单的方法,但它更适合学习)。
尝试使用 Python、beautifulsoup 以下内容:
from BeautifulSoup import BeautifulSoup
import urllib2
url="https://www.lotto.de/de/ergebnisse/lotto-6aus49/archiv.html"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
numbers=soup.findAll('li',{'class':'winning_numbers.boxRow.clearfix'})
for number in numbers:
print number['li']+","+number.string
Returns 没什么,这也是我意料之中的。我阅读了教程,但仍然没有完全理解解析。有人可以给我提示吗?
谢谢!
由于数据内容是动态生成的,您可以使用 EASIER 解决方案之一 Selenium or alike to simulate the action as a browser (I use PhantomJS 作为网络驱动程序),像这样:
from selenium import webdriver
url="https://www.lotto.de/de/ergebnisse/lotto-6aus49/archiv.html"
# I'm using PhantomJS, you may use your own...
driver = webdriver.PhantomJS(executable_path='/usr/local/bin/phantomjs')
driver.get(url)
soup = BeautifulSoup(driver.page_source)
# I just simply go through the div class and grab all number texts
# without special number, like in the Sample
for ul in soup.findAll('div', {'class': 'winning_numbers'}):
n = ','.join(li for li in ul.text.split() if li.isdigit())
if n:
print 'number: {}'.format(n)
number: 6,25,26,27,28,47
也抢专号:
for ul in soup.findAll('div', {'class': 'winning_numbers'}):
# grab only numeric chars, you may apply your own logic here
n = ','.join(''.join(_ for _ in li if _.isdigit()) for li in ul.text.split())
if n:
print 'number: {}'.format(n)
number: 6,25,26,27,28,47,5 # with special number
希望对您有所帮助。