使用 "load more" 按钮抓取页面
Scrape pages with "load more" button
我正在尝试从我的国家抓取股票代码,但我卡在了相关网站上的“加载更多”按钮上。
网站:https://br.tradingview.com/markets/stocks-brazilia/market-movers-all-stocks/
我的代码:
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
req = Request('https://br.tradingview.com/markets/stocks-brazilia/market-movers-all-stocks/', headers = {'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
bs = BeautifulSoup(webpage, 'lxml')
table = bs.find('table')
table_rows = table.find_all('tr')
tickers = [x.div.a.text for x in table_rows[1:]]
print(tickers)
['AALR3', 'ABEV3', 'AERI3',...]
print(len(tickers))
150
我想抓取“加载更多按钮”使我无法完成的所有数据。
是否可以使用 beautifulSoup 来做到这一点,或者我必须求助于 selenium?
当我尝试时:检查元素 > 网络 > 单击加载更多
我在我的代码中找不到要实施的请求的痕迹,有人可以解释一下吗?
您应该向后端 API 发出 POST 请求,在您的浏览器中打开开发者工具 - 网络选项卡 - fetch/XHR 然后点击“加载更多”并观看“扫描”查询,您可以在 python 中复制它,并通过像这样编辑 POST 请求来获取您想要的所有数据:
import requests
import pandas as pd
import json
rows_to_scrape = 1000
payload = {"filter":[{"left":"name","operation":"nempty"},
{"left":"type","operation":"equal","right":"stock"},
{"left":"subtype","operation":"equal","right":"common"},
{"left":"typespecs","operation":"has_none_of","right":"odd"}],
"options":{"lang":"pt"},"markets":["brazil"],
"symbols":{"query":{"types":[]},"tickers":[]},"columns":
["logoid","name","close","change","change_abs","Recommend.All","volume","Value.Traded","market_cap_basic","price_earnings_ttm","earnings_per_share_basic_ttm","number_of_employees","sector","description","type","subtype","update_mode","pricescale","minmov","fractional","minmove2","currency","fundamental_currency_code"],
"sort":{"sortBy":"name","sortOrder":"asc"},
"range": [0,rows_to_scrape]} #change this to get more/less data
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://scanner.tradingview.com/brazil/scan'
resp = requests.post(url,headers=headers,data=json.dumps(payload)).json()
output = [x['d'] for x in resp['data']]
print(len(output))
df= pd.DataFrame(output)
df.to_csv('tradingview_br.csv',index=False)
print('Saved to tradingview_br.csv')
应该很容易弄清楚每个数据点是什么,不幸的是该数据中没有任何标题
我正在尝试从我的国家抓取股票代码,但我卡在了相关网站上的“加载更多”按钮上。
网站:https://br.tradingview.com/markets/stocks-brazilia/market-movers-all-stocks/
我的代码:
from bs4 import BeautifulSoup
from urllib.request import Request, urlopen
req = Request('https://br.tradingview.com/markets/stocks-brazilia/market-movers-all-stocks/', headers = {'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()
bs = BeautifulSoup(webpage, 'lxml')
table = bs.find('table')
table_rows = table.find_all('tr')
tickers = [x.div.a.text for x in table_rows[1:]]
print(tickers)
['AALR3', 'ABEV3', 'AERI3',...]
print(len(tickers))
150
我想抓取“加载更多按钮”使我无法完成的所有数据。
是否可以使用 beautifulSoup 来做到这一点,或者我必须求助于 selenium?
当我尝试时:检查元素 > 网络 > 单击加载更多
我在我的代码中找不到要实施的请求的痕迹,有人可以解释一下吗?
您应该向后端 API 发出 POST 请求,在您的浏览器中打开开发者工具 - 网络选项卡 - fetch/XHR 然后点击“加载更多”并观看“扫描”查询,您可以在 python 中复制它,并通过像这样编辑 POST 请求来获取您想要的所有数据:
import requests
import pandas as pd
import json
rows_to_scrape = 1000
payload = {"filter":[{"left":"name","operation":"nempty"},
{"left":"type","operation":"equal","right":"stock"},
{"left":"subtype","operation":"equal","right":"common"},
{"left":"typespecs","operation":"has_none_of","right":"odd"}],
"options":{"lang":"pt"},"markets":["brazil"],
"symbols":{"query":{"types":[]},"tickers":[]},"columns":
["logoid","name","close","change","change_abs","Recommend.All","volume","Value.Traded","market_cap_basic","price_earnings_ttm","earnings_per_share_basic_ttm","number_of_employees","sector","description","type","subtype","update_mode","pricescale","minmov","fractional","minmove2","currency","fundamental_currency_code"],
"sort":{"sortBy":"name","sortOrder":"asc"},
"range": [0,rows_to_scrape]} #change this to get more/less data
headers = {'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
url = 'https://scanner.tradingview.com/brazil/scan'
resp = requests.post(url,headers=headers,data=json.dumps(payload)).json()
output = [x['d'] for x in resp['data']]
print(len(output))
df= pd.DataFrame(output)
df.to_csv('tradingview_br.csv',index=False)
print('Saved to tradingview_br.csv')
应该很容易弄清楚每个数据点是什么,不幸的是该数据中没有任何标题