无法打印从 python 中的正则表达式(仅限此)模块检索的数据?

Unable to print the data retrieved from regex(strictly only this) module in python?

这里我使用 python 中的 're' 模块来抓取网页,有 4 次迭代,每次迭代后它返回空数组,如 [''] 但输出应该是所需股票的股票价格 symbol.There 正则表达式变量在打印时没有错误 correctly.The 下面包含源代码。

import urllib
import re

symbolslist = ["appl","spy","goog","nflx"]

i=0
while i<len(symbolslist):
        url ="http://in.finance.yahoo.com/q?s=" +symbolslist[i] +"&ql=1"
        htmlfile = urllib.urlopen(url)
        htmltext = htmlfile.read()
        regex ='<span id="yfs_l84_'+symbolslist[i] +'">(.+?)</span>'
        pattern = re.compile(regex)
        print regex
        price = re.findall(pattern,htmltext)
        print "price of ",symbolslist[i],"is",price
        i+=1

并且在输出中没有语法或缩进错误,输出如下所示

<span id="yfs_l84_appl">(.+?)</span>
price of  appl is []
<span id="yfs_l84_spy">(.+?)</span>
price of  spy is []
<span id="yfs_l84_goog">(.+?)</span>
price of  goog is []
<span id="yfs_l84_nflx">(.+?)</span>
price of  nflx is []

数组中的股票值未打印

抓取的网页是https://in.finance.yahoo.com/q?s=NFLX&ql=0

作为替代方法,您可能会发现使用 yahoo_finance 库更容易,如下所示:

from yahoo_finance import Share

for symbol in ["appl", "spy", "goog", "nflx"]:
    yahoo = Share(symbol)
    print 'Price of {} is {}'.format(symbol, yahoo.get_price())

为您提供以下输出:

Price of appl is 96.11
Price of spy is 186.63
Price of goog is 682.40
Price of nflx is 87.40

尝试使用正则表达式解析 HTML 数据绝不是明智之举。


另一种方法是先使用 BeautifulSoup:

提取信息
from bs4 import BeautifulSoup
import requests
import re

for symbol in ["appl", "spy", "goog", "nflx"]:
    url = 'http://finance.yahoo.com/q?s={}'.format(symbol)
    r = requests.get(url)
    soup = BeautifulSoup(r.text, "html.parser")

    data = soup.find('span', attrs= {'id' : re.compile(r'yfs_.*?_{}'.format(symbol.lower()))})
    print 'Price of {} is {}'.format(symbol, data.text)