BeautifulSoup 不显示内容

Question

我想从 MCX India 网站上抓取现货价格数据。在检查元素时可见的 HTML 脚本如下：

<div class="contents spotmarketprice">
            <div id="cont-1" style="display: block;">
                <table class="mcx-table mrB20" width="100%" cellspacing="8" id="tblSMP">
                    <thead>
                        <tr>
                            <th class="symbol-head">
                                Commodity
                            </th>
                            <th>
                                Unit
                            </th>
                            <th class="left1">
                                Location
                            </th>
                            <th class="right1">
                                Spot Price (Rs.)
                            </th>
                            <th>
                                Up/Down
                            </th>
                        </tr>
                    </thead>
                    <tbody>
                      <tr>
                          <td class="symbol" style="width:30%;">ALMOND</td>
                          <td style="width:17%;">1 KGS</td>
                          <td align="left" style="width:17%;">DELHI</td>
                          <td align="right" style="width:17%;">558.00</td>  

                          <td align="right" class="padR20" style="width:19%;">=</td>                                         
                      </tr>

我写的代码是：

#import the required libraries    
from bs4 import BeautifulSoup
import requests

#Getting data from website
source= requests.get('http://www.mcxindia.com/market-data/spot-market-price').text

#Getting the html code of the website
soup = BeautifulSoup(source, 'lxml')

#Navigating to the blocks where required content is present
division_1= soup.find('div', class_="contents spotmarketprice").div.table

#Displaying the results
print(division_1.tbody)

输出：

<tbody>
   </tbody>

在网站上，我想要获取的内容在...中可用，但是，这里没有显示任何内容。请提出解决方案。

Answer 1

似乎 table 中的数据正在通过 JavaScript 上传。

这就是为什么，如果您尝试使用 requests 库获取此信息，您不会在 return 上收到 table 的数据。 requests 根本不支持JS。因此，这里的问题不在BeautifulSoup.

要抓取 JS 驱动的数据，请考虑使用 selenium 和 chromedriver。这种情况下的解决方案如下所示：

# import libraries
from bs4 import BeautifulSoup
from selenium import webdriver

# create a webdriver
chromedriver_path = 'C:\path\to\chromedriver.exe'
driver = webdriver.Chrome(chromedriver_path)

# go to the page and get its source
driver.get('http://www.mcxindia.com/market-data/spot-market-price')
soup = BeautifulSoup(driver.page_source, 'html.parser')

# fetch mentioned data
table = soup.find('table', {'id': 'tblSMP'})
for tr in table.tbody.find_all('tr'):
    row = [td.text for td in tr.find_all('td')]
    print(row)

# close the webdriver
driver.quit()

以上脚本的输出为：

['ALMOND', '1 KGS', 'DELHI', '558.00', '=']
['ALUMINIUM', '1 KGS', 'THANE', '137.60', '=']
['CARDAMOM', '1 KGS', 'VANDANMEDU', '2,525.00', '=']
['CASTORSEED', '100 KGS', 'DEESA', '3,626.00', '▼']
['CHANA', '100 KGS', 'DELHI', '4,163.00', '▲']
['COPPER', '1 KGS', 'THANE', '388.30', '=']
['COTTON', '1 BALES', 'RAJKOT', '15,790.00', '▲']
['CPO', '10 KGS', 'KANDLA', '630.10', '▼']
['CRUDEOIL', '1 BBL', 'MUMBAI', '2,418.00', '▲']
['GOLD', '10 GRMS', 'AHMEDABAD', '40,989.00', '=']
['GOLDGUINEA', '8 GRMS', 'AHMEDABAD', '32,923.00', '=']
['GOLDM', '10 GRMS', 'AHMEDABAD', '40,989.00', '=']
['GOLDPETAL', '1 GRMS', 'MUMBAI', '4,129.00', '=']
['GUARGUM', '100 KGS', 'JODHPUR', '5,880.00', '=']
['GUARSEED', '100 KGS', 'JODHPUR', '3,660.00', '=']

UPD：我必须指定上面的代码回答了看到这个特定 table 的问题。但是，有时网站将数据存储在 'application/json' 或可以使用 'requests' 库访问的类似标签中（因为它们不需要 JS）。

αԋɱҽԃ αмєяιcαη 发现，当前网站包含此类标签。请检查他的答案。这种情况下用requests确实比selenium好

Answer 2

import requests
import re
import json
import pandas as pd


goal = ['EnSymbol', 'Unit', 'Location', 'TodaysSpotPrice']

def main(url):
    r = requests.get(url)
    match = json.loads(re.search(r'"Data":(\[.*?\])', r.text).group(1))
    allin = []
    for item in match:
        allin.append([item[x] for x in goal])
    df = pd.DataFrame(allin, columns=goal)
    print(df)


main("https://www.mcxindia.com/market-data/spot-market-price")

输出：

         EnSymbol     Unit    Location  TodaysSpotPrice
0          ALMOND    1 KGS       DELHI           558.00
1       ALUMINIUM    1 KGS       THANE           137.60
2        CARDAMOM    1 KGS  VANDANMEDU          2525.00
3      CASTORSEED  100 KGS       DEESA          3626.00
4           CHANA  100 KGS       DELHI          4163.00
5          COPPER    1 KGS       THANE           388.30
6          COTTON  1 BALES      RAJKOT         15880.00
7             CPO   10 KGS      KANDLA           635.90
8        CRUDEOIL    1 BBL      MUMBAI          2418.00
9            GOLD  10 GRMS   AHMEDABAD         40989.00
10     GOLDGUINEA   8 GRMS   AHMEDABAD         32923.00
11          GOLDM  10 GRMS   AHMEDABAD         40989.00
12      GOLDPETAL   1 GRMS      MUMBAI          4129.00
13        GUARGUM  100 KGS     JODHPUR          5880.00
14       GUARSEED  100 KGS     JODHPUR          3660.00
15          KAPAS   20 KGS      RAJKOT           927.50
16           LEAD    1 KGS     CHENNAI           141.60
17      MENTHAOIL    1 KGS   CHANDAUSI          1295.10
18     NATURALGAS  1 mmBtu      HAZIRA           138.50
19         NICKEL    1 KGS       THANE           892.00
20         PEPPER  100 KGS       KOCHI         32700.00
21       RAW JUTE  100 KGS     KOLKATA          4999.00
22  RBD PALMOLEIN   10 KGS      KANDLA           700.40
23      REFSOYOIL   10 KGS      INDORE           845.25
24         SILVER    1 KGS   AHMEDABAD         36871.00
25        SILVERM    1 KGS   AHMEDABAD         36871.00
26      SILVERMIC    1 KGS   AHMEDABAD         36871.00
27      SUGARMDEL  100 KGS       DELHI          3380.00
28      SUGARMKOL  100 KGS    KOLHAPUR          3334.00
29      SUGARSKLP  100 KGS    KOLHAPUR          3275.00
30            TIN    1 KGS      MUMBAI          1160.50
31          WHEAT  100 KGS       DELHI          1977.50
32           ZINC    1 KGS       THANE           155.15

如果你想更改符号：

这是它的版本：

import requests
import re
import json
import pandas as pd


goal = ['EnSymbol', 'Unit', 'Location', 'TodaysSpotPrice', 'Change']


def main(url):
    r = requests.get(url)
    match = json.loads(re.search(r'"Data":(\[.*?\])', r.text).group(1))
    allin = []
    for item in match:
        item = [item[x] for x in goal]
        item[-1] = '▲' if item[-1] > 0 else '▼' if item[-1] < 0 else "="
        allin.append(item)
    df = pd.DataFrame(allin, columns=goal)
    print(df)


main("https://www.mcxindia.com/market-data/spot-market-price")

输出：

         EnSymbol     Unit    Location  TodaysSpotPrice Change
0          ALMOND    1 KGS       DELHI           558.00      =
1       ALUMINIUM    1 KGS       THANE           137.60      =
2        CARDAMOM    1 KGS  VANDANMEDU          2525.00      =
3      CASTORSEED  100 KGS       DEESA          3626.00      =
4           CHANA  100 KGS       DELHI          4163.00      =
5          COPPER    1 KGS       THANE           388.30      =
6          COTTON  1 BALES      RAJKOT         15880.00      ▲
7             CPO   10 KGS      KANDLA           635.90      ▲
8        CRUDEOIL    1 BBL      MUMBAI          2418.00      ▲
9            GOLD  10 GRMS   AHMEDABAD         40989.00      =
10     GOLDGUINEA   8 GRMS   AHMEDABAD         32923.00      =
11          GOLDM  10 GRMS   AHMEDABAD         40989.00      =
12      GOLDPETAL   1 GRMS      MUMBAI          4129.00      =
13        GUARGUM  100 KGS     JODHPUR          5880.00      =
14       GUARSEED  100 KGS     JODHPUR          3660.00      =
15          KAPAS   20 KGS      RAJKOT           927.50      ▲
16           LEAD    1 KGS     CHENNAI           141.60      =
17      MENTHAOIL    1 KGS   CHANDAUSI          1295.10      =
18     NATURALGAS  1 mmBtu      HAZIRA           138.50      ▲
19         NICKEL    1 KGS       THANE           892.00      =
20         PEPPER  100 KGS       KOCHI         32600.00      ▼
21       RAW JUTE  100 KGS     KOLKATA          4999.00      =
22  RBD PALMOLEIN   10 KGS      KANDLA           700.40      ▼
23      REFSOYOIL   10 KGS      INDORE           845.25      =
24         SILVER    1 KGS   AHMEDABAD         36871.00      =
25        SILVERM    1 KGS   AHMEDABAD         36871.00      =
26      SILVERMIC    1 KGS   AHMEDABAD         36871.00      =
27      SUGARMDEL  100 KGS       DELHI          3380.00      ▼
28      SUGARMKOL  100 KGS    KOLHAPUR          3334.00      ▲
29      SUGARSKLP  100 KGS    KOLHAPUR          3275.00      ▼
30            TIN    1 KGS      MUMBAI          1160.50      ▼
31          WHEAT  100 KGS       DELHI          1977.50      ▲
32           ZINC    1 KGS       THANE           155.15      =

BeautifulSoup 不显示内容

BeautifulSoup doesn't display the content

html

python

lxml

beautifulsoup

web-scraping