使用 BeautifulSoup 选择按钮后提取 "Table1"?

Extract "Table1" after selection button using BeautifulSoup?

我尝试在选择 "HK Stock" 和 "Show All" 按钮后在“https://www.bsgroup.com.hk/BrightSmart/MarginRatio/StockMarginRatioEnquiry.aspx?Lang=eng”中下载 table。我检查了 Chrome/Inspect/Network 函数。没有向服务器发送新数据的请求。因此,我怀疑数据在原始页面中。按 "Show All" 按钮后,我检查它出现在 "Table1" 中。我尝试了以下代码,但没有任何结果,请指教:

url="https://www.bsgroup.com.hk/BrightSmart/MarginRatio/StockMarginRatioEnquiry.aspx?Lang=eng"
result = requests.get(url)
result.raise_for_status()
result.encoding = "utf-8"

src = result.content
soup = BeautifulSoup(src, 'lxml')
table = soup.findAll("Table1")
output_rows = []
for table_row in table.findAll('tr'):
    columns = table_row.findAll('td')
    output_row = []
    for column in columns:
        output_row.append(column.text)
    output_rows.append(output_row)

print(output_rows)

要获取数据,您必须使用正确的参数进行 POST 请求。

例如:

import requests
from bs4 import BeautifulSoup

url = 'https://www.bsgroup.com.hk/BrightSmart/MarginRatio/StockMarginRatioEnquiry.aspx?Lang=eng'

with requests.session() as s:
    soup = BeautifulSoup(s.get(url).text, 'html.parser')

    data = {i['name']: i['value'] if 'value' in i.attrs else '' for i in soup.select('input[name]')}
    del data['StockMarginRatioGrid$btnFind']
    data['StockMarginRatioGrid$txtExchange'] = 'HKEX'

    soup = BeautifulSoup(s.post(url, data=data).text, 'html.parser')

    for tr in soup.select('#StockMarginRatioGrid_gridResult tr'):
        print(''.join('{:^21}'.format(td.text) for td in tr.select('td')))

打印:

 Stock Code              Name          Stock Margin Ratio      Deposit Ratio                              Stock Code              Name          Stock Margin Ratio      Deposit Ratio    
      1               CKHHOLDINGS              85%                  15%                                        2               CLPHOLDINGS              85%                  15%         
      3               HK&CHINAGAS              85%                  15%                                        4              WHARFHOLDINGS             82%                  18%         
      5              HSBCHOLDINGS              85%                  15%                                        6               POWERASSETS              85%                  15%         
      8                  PCCW                  75%                  25%                                       10              HANGLUNGGROUP             75%                  25%         
     11              HANGSENGBANK              85%                  15%                                       12              HENDERSONLAND             85%                  15%         
     14                HYSANDEV                75%                  25%                                       16                 SHKPPT                 85%                  15%         
     17               NEWWORLDDEV              85%                  15%                                       18              ORIENTALPRESS             20%                  80%         
     19              SWIREPACIFICA             85%                  15%                                       20                WHEELOCK                82%                  18%         
     23               BANKOFEASIA              75%                  25%                                       25             CHEVALIERINT'L             40%                  60%         

... and so on.

编辑:要写入 CSV 文件,您可以使用此示例:

import csv
import requests
from bs4 import BeautifulSoup

url = 'https://www.bsgroup.com.hk/BrightSmart/MarginRatio/StockMarginRatioEnquiry.aspx?Lang=eng'

with requests.session() as s, open('output.csv', 'w') as f_out:
    writer = csv.writer(f_out)

    soup = BeautifulSoup(s.get(url).text, 'html.parser')

    data = {i['name']: i['value'] if 'value' in i.attrs else '' for i in soup.select('input[name]')}
    del data['StockMarginRatioGrid$btnFind']
    data['StockMarginRatioGrid$txtExchange'] = 'HKEX'

    soup = BeautifulSoup(s.post(url, data=data).text, 'html.parser')

    for tr in soup.select('#StockMarginRatioGrid_gridResult tr'):
        writer.writerow([td.text.strip() for td in tr.select('td')])