BeautifulSoup 抓取 U.S。今日新闻股票 <table>

BeautifulSoup Scraping U.S. News Today Stock <table>

使用 Python,我试图从 U.S. Today Money Stocks Under 中剔除 table 低于 10 美元的股票。然后将每个元素添加到列表中(以便我可以遍历每只股票)。目前,我有这个代码:

resp = requests.get('https://money.usnews.com/investing/stocks/stocks-under-10')
soup = bs.BeautifulSoup(resp.text, "lxml")
table = soup.find('table', {'class': 'table stock full-row search-content'})
tickers = []
for row in table.findAll('tr')[1:]:
    ticker = str(row.findAll('td')[0].text)
    tickers.append(ticker)

我一直收到错误消息:

Traceback (most recent call last):
  File "sandp.py", line 98, in <module>
    sandp(0)
  File "sandp.py", line 40, in sandp
    for row in table.findAll('tr')[1:]:
AttributeError: 'NoneType' object has no attribute 'findAll'

该站点是动态的,因此您可以使用 selenium:

from selenium import webdriver
import collections
from bs4 import BeautifulSoup as soup
import re
d = webdriver.Chrome('/path/to/chromedriver')
d.get('https://money.usnews.com/investing/stocks/stocks-under-10')
s = soup(d.page_source, 'lxml')
while True:
  try:
    d.find_element_by_link_text("Load More").click() #get all data
  except:
    break
company = collections.namedtuple('company', ['name', 'abbreviation', 'description', 'stats'])
headers = [['a', {'class':'search-result-link'}], ['a', {'class':'text-muted'}], ['p', {'class':'text-small show-for-medium-up ellipsis'}], ['dl', {'class':'inline-dl'}], ['span', {'class':'stock-trend'}], ['div', {'class':'flex-row'}]]
final_data = [[getattr(i.find(a, b), 'text', None) for a, b in headers] for i in soup(d.page_source, 'html.parser').find_all('div', {'class':'search-result flex-row'})]
new_data = [[i[0], i[1], re.sub('\n+\s{2,}', '', i[2]), [re.findall('[$\w\.%/]+', d) for d in i[3:]]] for i in final_data]
final_results = [i[:3]+[dict(zip(['Price', 'Daily Change', 'Percent Change'], filter(lambda x:re.findall('\d', x), i[-1][0])))] for i in new_data]
new_results = [company(*i) for i in final_results]

产量(第一家公司):

company(name=u'Aileron Therapeutics Inc', abbreviation=u'ALRN', description=u'Aileron Therapeutics, Inc. is a clinical stage biopharmaceutical company, which focuses on developing and commercializing stapled peptides. Its ALRN-6924 product targets the tumor suppressor p53 for the treatment of a wide variety of cancers. It also offers the MDMX and MDM2. The company was founded by Gregory L. Verdine, Rosana Kapeller, Huw M. Nash, Joseph A. Yanchik III, and Loren David Walensky in June 2005 and is headquartered in Cambridge, MA.more\n', stats={'Daily Change': u'[=11=].02', 'Price': u'.04', 'Percent Change': u'0.33%'})

编辑:

所有缩写:

abbrevs = [i.abbreviation for i in new_results]

输出:

[u'ALRN', u'HAIR', u'ONCY', u'EAST', u'CERC', u'ENPH', u'CASI', u'AMBO', u'CWBR', u'TRXC', u'NIHD', u'LGCY', u'MRNS', u'RFIL', u'AUTO', u'NEPT', u'ARQL', u'ITUS', u'SRAX', u'APTO']