Error: Requests and lxml libraries return empty brackets in web scraping
Error: Requests and lxml libraries return empty brackets in web scraping
我在 Python 中使用 Requests 和 lxml 库进行网络抓取时遇到问题。
我需要从网站 (http://www.b3.com.br/pt_br/market-data-e-indices/indices/indices-amplos/indice-ibovespa-ibovespa-composicao-da-carteira.htm) 中捕获黄色信息。但是,这个 returns: []
拜托,有人可以帮助我吗?
发送下面的代码
from lxml import html
import requests
page = requests.get('http://www.b3.com.br/pt_br/market-data-e-indices/indices/indices-amplos/indice-ibovespa-ibovespa-composicao-da-carteira.htm')
tree = html.fromstring(page.content)
cod = tree.xpath('//*[@id="divContainerIframeB3"]/div/div[1]/form/div[2]/div/table/tbody/tr[1]/td[1]')
print('The code is : ', cod)
return 的图像:
检查浏览器:
数据是通过Javascript从外部源加载的。您可以使用此脚本加载 Json 数据:
import json
import base64
import requests
api_url = "https://sistemaswebb3-listados.b3.com.br/indexProxy/indexCall/GetPortfolioDay/{encoded_string}"
page = 1
index = "IBOV"
s = {
"language": "pt-br",
"pageNumber": page,
"pageSize": 20,
"index": index,
"segment": "1",
}
encoded_string = base64.b64encode(str(s).encode("utf-8")).decode("utf-8")
data = requests.get(
api_url.format(encoded_string=encoded_string),
verify=False,
).json()
# uncomment this to get all data:
# print(json.dumps(data, indent=4))
for result in data["results"]:
print(
"{:<8} {:<15} {:15}".format(
result["cod"], result["asset"], result["theoricalQty"]
)
)
打印:
ABEV3 AMBEV S/A 4.355.174.839
ASAI3 ASSAI 157.635.935
AZUL4 AZUL 327.283.207
BTOW3 B2W DIGITAL 201.549.295
B3SA3 B3 1.930.877.944
BBSE3 BBSEGURIDADE 671.584.841
BRML3 BR MALLS PAR 843.728.684
BBDC3 BRADESCO 1.261.986.269
BBDC4 BRADESCO 4.687.814.597
BRAP4 BRADESPAR 222.075.664
BBAS3 BRASIL 1.283.197.221
BRKM5 BRASKEM 264.640.575
BRFS3 BRF SA 811.759.800
BPAC11 BTGP BANCO 263.871.572
CRFB3 CARREFOUR BR 391.758.726
CCRO3 CCR SA 1.115.695.556
CMIG4 CEMIG 969.723.092
HGTX3 CIA HERING 126.186.408
CIEL3 CIELO 1.112.196.638
COGN3 COGNA ON 1.847.994.874
我在 Python 中使用 Requests 和 lxml 库进行网络抓取时遇到问题。
我需要从网站 (http://www.b3.com.br/pt_br/market-data-e-indices/indices/indices-amplos/indice-ibovespa-ibovespa-composicao-da-carteira.htm) 中捕获黄色信息。但是,这个 returns: []
拜托,有人可以帮助我吗?
发送下面的代码
from lxml import html
import requests
page = requests.get('http://www.b3.com.br/pt_br/market-data-e-indices/indices/indices-amplos/indice-ibovespa-ibovespa-composicao-da-carteira.htm')
tree = html.fromstring(page.content)
cod = tree.xpath('//*[@id="divContainerIframeB3"]/div/div[1]/form/div[2]/div/table/tbody/tr[1]/td[1]')
print('The code is : ', cod)
return 的图像:
检查浏览器:
数据是通过Javascript从外部源加载的。您可以使用此脚本加载 Json 数据:
import json
import base64
import requests
api_url = "https://sistemaswebb3-listados.b3.com.br/indexProxy/indexCall/GetPortfolioDay/{encoded_string}"
page = 1
index = "IBOV"
s = {
"language": "pt-br",
"pageNumber": page,
"pageSize": 20,
"index": index,
"segment": "1",
}
encoded_string = base64.b64encode(str(s).encode("utf-8")).decode("utf-8")
data = requests.get(
api_url.format(encoded_string=encoded_string),
verify=False,
).json()
# uncomment this to get all data:
# print(json.dumps(data, indent=4))
for result in data["results"]:
print(
"{:<8} {:<15} {:15}".format(
result["cod"], result["asset"], result["theoricalQty"]
)
)
打印:
ABEV3 AMBEV S/A 4.355.174.839
ASAI3 ASSAI 157.635.935
AZUL4 AZUL 327.283.207
BTOW3 B2W DIGITAL 201.549.295
B3SA3 B3 1.930.877.944
BBSE3 BBSEGURIDADE 671.584.841
BRML3 BR MALLS PAR 843.728.684
BBDC3 BRADESCO 1.261.986.269
BBDC4 BRADESCO 4.687.814.597
BRAP4 BRADESPAR 222.075.664
BBAS3 BRASIL 1.283.197.221
BRKM5 BRASKEM 264.640.575
BRFS3 BRF SA 811.759.800
BPAC11 BTGP BANCO 263.871.572
CRFB3 CARREFOUR BR 391.758.726
CCRO3 CCR SA 1.115.695.556
CMIG4 CEMIG 969.723.092
HGTX3 CIA HERING 126.186.408
CIEL3 CIELO 1.112.196.638
COGN3 COGNA ON 1.847.994.874