无法打印从 python 中的正则表达式(仅限此)模块检索的数据?
Unable to print the data retrieved from regex(strictly only this) module in python?
这里我使用 python 中的 're' 模块来抓取网页,有 4 次迭代,每次迭代后它返回空数组,如 [''] 但输出应该是所需股票的股票价格 symbol.There 正则表达式变量在打印时没有错误 correctly.The 下面包含源代码。
import urllib
import re
symbolslist = ["appl","spy","goog","nflx"]
i=0
while i<len(symbolslist):
url ="http://in.finance.yahoo.com/q?s=" +symbolslist[i] +"&ql=1"
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
regex ='<span id="yfs_l84_'+symbolslist[i] +'">(.+?)</span>'
pattern = re.compile(regex)
print regex
price = re.findall(pattern,htmltext)
print "price of ",symbolslist[i],"is",price
i+=1
并且在输出中没有语法或缩进错误,输出如下所示
<span id="yfs_l84_appl">(.+?)</span>
price of appl is []
<span id="yfs_l84_spy">(.+?)</span>
price of spy is []
<span id="yfs_l84_goog">(.+?)</span>
price of goog is []
<span id="yfs_l84_nflx">(.+?)</span>
price of nflx is []
数组中的股票值未打印
作为替代方法,您可能会发现使用 yahoo_finance
库更容易,如下所示:
from yahoo_finance import Share
for symbol in ["appl", "spy", "goog", "nflx"]:
yahoo = Share(symbol)
print 'Price of {} is {}'.format(symbol, yahoo.get_price())
为您提供以下输出:
Price of appl is 96.11
Price of spy is 186.63
Price of goog is 682.40
Price of nflx is 87.40
尝试使用正则表达式解析 HTML 数据绝不是明智之举。
另一种方法是先使用 BeautifulSoup:
提取信息
from bs4 import BeautifulSoup
import requests
import re
for symbol in ["appl", "spy", "goog", "nflx"]:
url = 'http://finance.yahoo.com/q?s={}'.format(symbol)
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
data = soup.find('span', attrs= {'id' : re.compile(r'yfs_.*?_{}'.format(symbol.lower()))})
print 'Price of {} is {}'.format(symbol, data.text)
这里我使用 python 中的 're' 模块来抓取网页,有 4 次迭代,每次迭代后它返回空数组,如 [''] 但输出应该是所需股票的股票价格 symbol.There 正则表达式变量在打印时没有错误 correctly.The 下面包含源代码。
import urllib
import re
symbolslist = ["appl","spy","goog","nflx"]
i=0
while i<len(symbolslist):
url ="http://in.finance.yahoo.com/q?s=" +symbolslist[i] +"&ql=1"
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
regex ='<span id="yfs_l84_'+symbolslist[i] +'">(.+?)</span>'
pattern = re.compile(regex)
print regex
price = re.findall(pattern,htmltext)
print "price of ",symbolslist[i],"is",price
i+=1
并且在输出中没有语法或缩进错误,输出如下所示
<span id="yfs_l84_appl">(.+?)</span>
price of appl is []
<span id="yfs_l84_spy">(.+?)</span>
price of spy is []
<span id="yfs_l84_goog">(.+?)</span>
price of goog is []
<span id="yfs_l84_nflx">(.+?)</span>
price of nflx is []
数组中的股票值未打印
作为替代方法,您可能会发现使用 yahoo_finance
库更容易,如下所示:
from yahoo_finance import Share
for symbol in ["appl", "spy", "goog", "nflx"]:
yahoo = Share(symbol)
print 'Price of {} is {}'.format(symbol, yahoo.get_price())
为您提供以下输出:
Price of appl is 96.11
Price of spy is 186.63
Price of goog is 682.40
Price of nflx is 87.40
尝试使用正则表达式解析 HTML 数据绝不是明智之举。
另一种方法是先使用 BeautifulSoup:
提取信息from bs4 import BeautifulSoup
import requests
import re
for symbol in ["appl", "spy", "goog", "nflx"]:
url = 'http://finance.yahoo.com/q?s={}'.format(symbol)
r = requests.get(url)
soup = BeautifulSoup(r.text, "html.parser")
data = soup.find('span', attrs= {'id' : re.compile(r'yfs_.*?_{}'.format(symbol.lower()))})
print 'Price of {} is {}'.format(symbol, data.text)