如何从该站点获取最后一个 table？

Question

我正在尝试使用 python 从这个 site 中获取最后一个 table。下面是我的实际尝试。

table 被命名为 "Dados Colocação, nos Termos do Anexo VII da Instrução CVM nº 400, de 2003"。

lin_cvm_oferta = 'http://web.cvm.gov.br/app/esforcosrestritos/#/enviarFormularioEncerramento?type=dmlldw%3D%3D&ofertaId=MTE3NDE%3D&state=eyJhbm8iOiJNakF4T0E9PSIsInZhbG9yIjoiTVRVPSIsImNvbXVuaWNhZG8iOiJNUT09Iiwic2l0dWFjYW8iOiJNZz09In0%3D'
html = requests.get(lin_cvm_oferta).text
print(html)

当我打印 html 它没有得到任何数据。

table 的第一部分我已经用 Json 得到了，因为我的朋友@JackFleeting 帮助我解决了另一个问题（). PS: I know that there is a similar solution here。但我不想使用硒。

Answer 1

这个问题与您之前的问题不同 - 该页面使用 post，而不是 get 方法。您必须使用浏览器中的 developer/network/xhr 工具来提取 url 和有效负载，然后 post 像这样：

import requests      
import json  

url = 'http://web.cvm.gov.br/app/esforcosrestritos/comunicado/getUltimoComunicado'

payload = {"id":931,"dataInclusao":"2016-05-20T09:26:00Z", "dataInicio":"2016-05-18T00:00:00Z","dataEnceramento":"2016-07-05T00:00:00Z", "numeroEmissao":1,"quantidadeSerie":140,"valorMobiliario":{"id":11,
    "dataInclusao":"2015-12-01T00:00:00Z",
    "descricao":"CERTIFICADOS DE RECEBÍVEIS IMOBILIÁRIOS - CRI",
    "relacionadoFundoInvestimento":False,"situacao":"ATIVO"},
    "tipoEspecie":{"id":3,"descricao":"Sem Preferência"},
    "tipoClasse":{"id":4,"descricao":"Não Aplicável"},
    "tipoOferta":{"id":1,"descricao":"Primária"},"tipoForma":{"id":3,"descricao":"Nominativa e Escritural"},"ofertante":{"id":1860,"nomeResponsavel":"RB CAPITAL COMPANHIA DE SECURITIZAÇÃO","cnpj":2773542000122,"paginaWeb":"http://www.rbcapital.com/","tipoSocietario":{"id":4,"descricao":"Sociedade Anônima de Capital Aberto"}},"emissor":{"id":1859,"nomeResponsavel":"RB CAPITAL COMPANHIA DE SECURITIZAÇÃO","cnpj":2773542000122,"paginaWeb":"http://www.rbcapital.com/","tipoSocietario":{"id":4,"descricao":"Sociedade Anônima de Capital Aberto"}},"lider":{"id":931,"nrPfPj":17298092000130,"dataRegistro":"1998-10-15T00:00:00Z","codigoTipoPessoa":"PJ","codigoTipoParticipante":12},"instituicoesIntermediarias":[{"id":1089,"nrPfPj":59588111000103,"dataRegistro":"1991-08-12T00:00:00Z","codigoTipoPessoa":"PJ","codigoTipoParticipante":12,"denominacaoSocial":"BANCO VOTORANTIM SA"},{"id":1090,"nrPfPj":90400888000142,"dataRegistro":"1990-12-20T00:00:00Z","codigoTipoPessoa":"PJ","codigoTipoParticipante":12,"denominacaoSocial":"BANCO SANTANDER (BRASIL) S.A."}],
               "valorPrecoUnitario":"1.000,00","inativo":False,
               "qtdValoresMobiliarios":0,"valorTotalOferta":0,"variasSeries":True}


headers = {'content-type': 'application/json'}

resp = requests.post(url, data=json.dumps(payload), headers=headers)    
data = json.loads(resp.content)
print(data)

请注意，根据您的 IDE，您可能需要手动将布尔值更改为 True 和 False（大写，就像我在上面所做的那样），尽管网站的 post 请求本身使用小写。

如何从该站点获取最后一个 table？

How to get the last table from this site?

python

json

web-scraping

ng-view