使用 urllib 进行数据抓取
Data Scraping using urllib
我正在尝试从中获取一些数据 link
从上面得到开盘价link我使用下面的代码
import urllib
from urllib.request import urlopen
symbols=['KEL', 'BYCO']
def keystats():
try:
response = urllib.request.urlopen('http://www.scstrade.com/StockScreening/SS_CompanySnapShot.aspx?symbol='+symbol)
sourcecode = response.readlines()
sourcecode = str(sourcecode)
open_price = sourcecode.split('<span id="MainContent_lbl_open" style="font-weight:bold;">')[1].split('</span>')[0]
print(open_price)
except:
print('Error')
for symbol in symbols:
keystats()
我相信上面的代码应该能够给我开盘价 8.20 的结果。相反,我得到一个错误字符串。谁能告诉我上面的代码有什么问题吗?
问题是您用来分隔的字符串,如果您检查 HTML,您会注意到术语是:<span id="MainContent_lbl_open"><b>8.20</b></span>
,因此您应该查找这些术语。
代码:
import urllib
from urllib.request import urlopen
symbols=['KEL', 'BYCO']
def keystats():
try:
response = urllib.request.urlopen('http://www.scstrade.com/StockScreening/SS_CompanySnapShot.aspx?symbol='+symbol)
sourcecode = response.readlines()
sourcecode = str(sourcecode)
open_price = sourcecode.split('<span id="MainContent_lbl_open"><b>')[1].split('</b></span>')[0]
print(open_price)
except:
print('Error')
for symbol in symbols:
keystats()
输出:
8.20
21.59
我正在尝试从中获取一些数据 link
从上面得到开盘价link我使用下面的代码
import urllib
from urllib.request import urlopen
symbols=['KEL', 'BYCO']
def keystats():
try:
response = urllib.request.urlopen('http://www.scstrade.com/StockScreening/SS_CompanySnapShot.aspx?symbol='+symbol)
sourcecode = response.readlines()
sourcecode = str(sourcecode)
open_price = sourcecode.split('<span id="MainContent_lbl_open" style="font-weight:bold;">')[1].split('</span>')[0]
print(open_price)
except:
print('Error')
for symbol in symbols:
keystats()
我相信上面的代码应该能够给我开盘价 8.20 的结果。相反,我得到一个错误字符串。谁能告诉我上面的代码有什么问题吗?
问题是您用来分隔的字符串,如果您检查 HTML,您会注意到术语是:<span id="MainContent_lbl_open"><b>8.20</b></span>
,因此您应该查找这些术语。
代码:
import urllib
from urllib.request import urlopen
symbols=['KEL', 'BYCO']
def keystats():
try:
response = urllib.request.urlopen('http://www.scstrade.com/StockScreening/SS_CompanySnapShot.aspx?symbol='+symbol)
sourcecode = response.readlines()
sourcecode = str(sourcecode)
open_price = sourcecode.split('<span id="MainContent_lbl_open"><b>')[1].split('</b></span>')[0]
print(open_price)
except:
print('Error')
for symbol in symbols:
keystats()
输出:
8.20
21.59