Web Scraping (Requests, Beautiful Soup) - 需要更具体的value/text

Question

我正在尝试编写代码来抓取 stock/fund 门票信息（当前价格）。我有一个程序目前可以提取所需的值。但是，它还会从网页中提取一堆其他文本。我想知道是否可以使用 requests 和 bs 来更有效地抓取数据，而不是必须使用我自己的额外代码行来缩小输出范围。

这是我正在使用的URL：https://www.marketwatch.com/investing/fund/fspsx

下面是我目前的代码，不包括我必须缩小输出范围的行。

import bs4 
import requests


def scrape():
    res = requests.get('https://www.marketwatch.com/investing/fund/fspsx')
    soup = bs4.BeautifulSoup(res.text,'lxml')
    Current_Level = soup.find_all(class_="intraday__price")
    Current_Level = str(Current_Level)
    print(Current_Level)


scrape()

下面是这段代码的当前输出：

[<h3 class="intraday__price">
<sup class="character">$</sup>
<bg-quote channel="/zigman2/quotes/206347152/realtime" class="value" field="Last" format="0,0.00">48.27</bg-quote>
</h3>]

我只想获取接近输出末尾的价格值。在这种情况下，它是 48.27.

Answer 1

您可以使用 CSS select 或：.intraday__price bg-quote 这将 select 标签 bg-quote 在 class intraday__price.

为了使用 CSS select 或者，使用 .select_one() 方法而不是 .find()。

import bs4
import requests


def scrape():
    res = requests.get('https://www.marketwatch.com/investing/fund/fspsx')
    soup = bs4.BeautifulSoup(res.text,'lxml')
    print(soup.select_one(".intraday__price bg-quote").text)

scrape()

输出：

48.27

Answer 2

你很接近。你只需要继续你的提取。尝试：

def scrape():
    res = requests.get('https://www.marketwatch.com/investing/fund/fspsx')
    soup = bs4.BeautifulSoup(res.text,'lxml')
    Current_Level = soup.find(class_="intraday__price") # use 'find' ilo 'find_all'
    if Current_Level is not None:
        quote = Current_Level.find('bg-quote')
        if quote is not None:
            print(quote.text)
            return quote.text

尝试使用“查找”而不是“find_all”，因为 class 看起来只在页面上出现一次。然后进行额外的 find() 调用以进一步缩小输出范围，然后进行 .text 以提取所需的值。

如果您更喜欢更紧凑和通用的，并且有一些异常捕获，那么：

def scrape(fund='fspsx'):
    try:
        res = requests.get('https://www.marketwatch.com/investing/fund/' + fund)            
        value = bs4.BeautifulSoup(res.text,'lxml').find(class_="intraday__price").find('bg-quote').text
    except Exception:
        value = None
    return value

Answer 3

我是这样解决问题的：

import bs4 
import requests

def scrape():
    res = requests.get('https://www.marketwatch.com/investing/fund/fspsx')
    soup = bs4.BeautifulSoup(res.text,'lxml')
    Current_Level = soup.find(class_="intraday__price").find(class_='value').text
    print(Current_Level)

scrape()

就我而言，它在交易时段和下班后都有效。

Web Scraping (Requests, Beautiful Soup) - 需要更具体的value/text

Web Scraping (Requests, Beautiful Soup) - need to get a more specific value/text

python

beautifulsoup

coding-efficiency

web-scraping

python-requests