无法打印从 python 中的正则表达式（仅限此）模块检索的数据？

Question

这里我使用 python 中的 're' 模块来抓取网页，有 4 次迭代，每次迭代后它返回空数组，如 [''] 但输出应该是所需股票的股票价格 symbol.There 正则表达式变量在打印时没有错误 correctly.The 下面包含源代码。

import urllib
import re

symbolslist = ["appl","spy","goog","nflx"]

i=0
while i<len(symbolslist):
        url ="http://in.finance.yahoo.com/q?s=" +symbolslist[i] +"&ql=1"
        htmlfile = urllib.urlopen(url)
        htmltext = htmlfile.read()
        regex ='<span id="yfs_l84_'+symbolslist[i] +'">(.+?)</span>'
        pattern = re.compile(regex)
        print regex
        price = re.findall(pattern,htmltext)
        print "price of ",symbolslist[i],"is",price
        i+=1

并且在输出中没有语法或缩进错误，输出如下所示

<span id="yfs_l84_appl">(.+?)</span>
price of  appl is []
<span id="yfs_l84_spy">(.+?)</span>
price of  spy is []
<span id="yfs_l84_goog">(.+?)</span>
price of  goog is []
<span id="yfs_l84_nflx">(.+?)</span>
price of  nflx is []

数组中的股票值未打印

抓取的网页是https://in.finance.yahoo.com/q?s=NFLX&ql=0

Answer 1

作为替代方法，您可能会发现使用 yahoo_finance 库更容易，如下所示：

from yahoo_finance import Share

for symbol in ["appl", "spy", "goog", "nflx"]:
    yahoo = Share(symbol)
    print 'Price of {} is {}'.format(symbol, yahoo.get_price())

为您提供以下输出：

Price of appl is 96.11
Price of spy is 186.63
Price of goog is 682.40
Price of nflx is 87.40

尝试使用正则表达式解析 HTML 数据绝不是明智之举。

另一种方法是先使用 BeautifulSoup:

提取信息

from bs4 import BeautifulSoup
import requests
import re

for symbol in ["appl", "spy", "goog", "nflx"]:
    url = 'http://finance.yahoo.com/q?s={}'.format(symbol)
    r = requests.get(url)
    soup = BeautifulSoup(r.text, "html.parser")

    data = soup.find('span', attrs= {'id' : re.compile(r'yfs_.*?_{}'.format(symbol.lower()))})
    print 'Price of {} is {}'.format(symbol, data.text)

无法打印从 python 中的正则表达式（仅限此）模块检索的数据？

Unable to print the data retrieved from regex(strictly only this) module in python?

python

regex

urllib2

beautifulsoup

python-2.7