如何从网页中提取 page_source
How to extract page_source from a webpage
我正在尝试从政府网页获取数据,但是,当我获取页面源时,它不包含浏览器中显示的数据。
from selenium import webdriver
from selenium.webdriver.support.ui import Select
page = 'http://web.cvm.gov.br/app/esforcosrestritos/#/consultarOferta'
driver = webdriver.Chrome()
driver.get(page)
## Click on "Encerrada"
driver.find_element_by_xpath('//*[@id="content"]/div[4]/div[2]/div/div /div[4]/div[2]/label[3]/input').click()
## Select year
year = Select(driver.find_element_by_xpath('//*[@id="content"]/div[4]/div[2]/div/div/div[4]/div[1]/div/select'))
year.select_by_visible_text('2017')
## Click on "Pesquisar"
driver.find_element_by_xpath('//*[@id="content"]/div[4]/div[3]/div/a[1]/span').click()
## Click on "DEBENTURES SIMPLES" inside "Ofertas Encerradas"
driver.find_element_by_css_selector('#content > div.container.ng-scope > div:nth-child(4) > div:nth-child(2) > div > table > tbody > tr:nth-child(15) > td.col-lg-2.text-left.ng-binding').click()
## Click on 1st result
driver.find_element_by_css_selector('#content > div.container.ng-scope > div:nth-child(4) > div > div > table > tbody > tr.text-center > td.text-left.ng-binding').click()
##Page Source
html = driver.page_source
在这个例子中,第一个字段 "CNPJ",而不是得到值 '04.031.960/00001-70',我得到这个:
<input type="text" class="form-control ng-pristine ng-untouched ng-valid ng-valid-maxlength" data-ng-cnpj="" data-ng-model="$responsavel.ofertante.cnpj" data-ng-change="getNomeResponsavelPorCnpj($responsavel.ofertante)" data-ng-disabled="mesmosDadosEmissor || $responsavel.disabled" maxlength="18" disabled="disabled">
此外,如果我将鼠标悬停在浏览器中的值上,则无法 select 它。
有没有办法从这种类型的页面中获取数据?
一旦 click()
在 第一个结果 上,您需要为 Heading[ 引入 WebDriverWait =26=] **** 可见,然后你可以提取 page_source 如下:
代码块:
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='ng-binding ng-scope'][contains(.,'RIO DE ENCERRAMENTO DE OFERTA P')]")))
##Page Source
print(driver.page_source)
控制台输出:
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" lang="pt_br" data-ng-app="app" class="ng-scope"><head><style type="text/css">@charset "UTF-8";[ng\:cloak],[ng-cloak],[data-ng-cloak],[x-ng-cloak],.ng-cloak,.x-ng-cloak,.ng-hide:not(.ng-hide-animate){display:none !important;}ng\:form{display:block;}</style>
<meta charset="UTF-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<meta http-equiv="CACHE-CONTROL" content="NO-CACHE" />
<meta http-equiv="EXPIRES" content="Mon, 22 Jul 2002 11:12:01 GMT" />
<title>Sistema Ofertas com Esforços Restritos</title>
<link rel="shortcut icon" href="resources/img/favicon.ico" />
<link rel="stylesheet" href="resources/css/open-sans.css" />
<link rel="stylesheet" href="resources/css/bootstrap/css/bootstrap.min.css" />
<link rel="stylesheet" href="resources/css/bootstrap/css/bootstrap-theme.min.css" />
<link rel="stylesheet" href="resources/js/bootstrap-datepicker/datepicker.css" />
<link rel="stylesheet" href="resources/js/ngTable/ng-table.min.css" />
<link rel="stylesheet" href="resources/css/cvm.css" />
</head>
<body class="modal-open" style="padding-right: 17px;">
<div id="fullContent">
<div id="content" data-ng-controller="AutenticarUsuarioController" class="ng-scope">
<!-- INICIO MENU BRASIL -->
<div class="nav-brasil">
<div class="navbar navbar-default">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#brasil">
<img src="resources/img/brazil-flag_05.png" />
</button>
</div>
<!-- Collect the nav links, forms, and other content for toggling -->
<div class="collapse navbar-collapse" id="brasil">
<ul class="nav navbar-nav">
<li><a class="icon-brasil" href="http://www.brasil.gov.br/" target="_blank">BRASIL</a></li>
<li><a href="http://www.acessoainformacao.gov.br/sistema/" target="_blank">Acesso à informação</a></li>
</ul>
<ul class="nav navbar-nav navbar-right">
<li class="first-li"><a href="http://brasil.gov.br/barra#participe" target="_blank">Participe</a></li>
<li><a href="http://www.servicos.gov.br/" target="_blank">Serviços</a></li>
<li><a href="http://www.planalto.gov.br/legislacao" target="_blank">Legislação</a></li>
<li><a href="http://brasil.gov.br/barra#orgaos-atuacao-canais" target="_blank">Canais</a></li>
</ul>
</div><!-- /.navbar-collapse -->
</div>
</div>
</div>
<!-- FIM MENU BRASIL -->
<!-- INICIO CABEÇALHO -->
<div id="header">
<div class="container">
<div class="row">
<div class="col-lg-4">
<h5>CVM - Comissão de Valores Mobiliários</h5>
</div>
<div class="text-right" data-ng-init="initContraste()">
<a class="h6" href="javascript:void(0)" data-ng-click="altoContraste()">ALTO CONTRASTE</a>
</div>
</div>
<a class="h2" href="javascript:void(0)" data-ng-click="abrirPaginaPrincipal()">Sistemas de Ofertas Públicas com Esforços Restritos</a>
<div class="row">
<div class="col-lg-3">
<h5>GOVERNO FEDERAL</h5>
</div>
<!-- ngIf: temUsuario() -->
</div>
</div>
</div>
<!-- FIM CABEÇALHO -->
<!-- INICIO MENU PRINCIPAL -->
<!-- INICIO MENU PRINCIPAL -->
<div class="nav-principal">
<div class="navbar navbar-default">
<!-- ngIf: temUsuario() -->
</div>
</div>
<!-- FIM MENU PRINCIPAL -->
<!-- INICIO CONTEÚDO -->
<!-- ngView: --><div data-ng-view="" class="container ng-scope">
<div data-ng-init="init()" class="ng-scope">
<div class="row row-title">
<div class="right-title">
<!-- ngIf: acao.isAcaoVisualizar() && permissaoAlteracao -->
<!-- ngIf: acao.isAcaoVisualizar() && permissaoAlteracao -->
<a class="btn btn-link" href="ajuda/Envio_Formulario_Encerramento.pdf" target="_blank">
<img src="resources/img/ajuda.png" />
<span class="ng-binding">Ajuda</span>
</a>
</div>
<!-- ngIf: acao.isAcaoIncluir() -->
<!-- ngIf: acao.isAcaoAlterar() -->
<!-- ngIf: acao.isAcaoVisualizar() --><div data-ng-if="acao.isAcaoVisualizar()" class="ng-binding ng-scope">VISUALIZAR FORMULÁRIO DE ENCERRAMENTO DE OFERTA PÚBLICA COM ESFORÇOS RESTRITOS</div><!-- end ngIf: acao.isAcaoVisualizar() -->
</div>
<div style="min-height: 1200px">
<div class="row row-required ng-binding">* Campos Obrigatórios</div>
<!-- ngIf: acao.isAcaoAlterar() && !usuarioGestor -->
<div data-ng-responsavel="$responsavel"></div>
<div data-ng-oferta="$oferta"></div>
<div data-ng-intermediario="$intermediario"></div>
<div data-ng-colocacao="$colocacao"></div>
</div>
<div class="row row-center">
<div class="col-center">
<a class="btn btn-default" role="button" href="javascript:void(0)" data-ng-click="voltar()">
<img src="resources/img/arrow-left.png" />
<span class="ng-binding">Voltar</span>
</a>
<!-- ngIf: acao.isAcaoIncluir() -->
<!-- ngIf: acao.isAcaoAlterar() -->
</div>
</div>
</div></div>
<!-- FIM CONTEÚDO -->
</div>
<!-- INICIO RODAPÉ -->
<div id="footer">
<div class="container footer-container">
<div class="row">
<div class="col-lg-8">
<a href="http://www.acessoainformacao.gov.br/sistema/" target="_blank">
<img src="resources/img/logo-acesso_25.png" />
</a>
</div>
<div class="col-lg-2 text-right cvm-footer-description">
<h6>CVM - Comissão de</h6><h6>Valores Mobiliários</h6>
</div>
<a href="http://www.brasil.gov.br/"><span class="logo-brasil-footer"></span></a>
</div>
</div>
<div class="version-sistem">
<div class="container">
</div>
</div>
</div>
<!-- FIM RODAPÉ -->
</div>
<!-- DEPENDÊNCIAS JAVA SCRIPT -->
<script type="text/javascript" src="resources/js/jquery/jquery-2.1.3.min.js"></script>
<script type="text/javascript" src="resources/js/base64/jquery.base64.min.js"></script>
<script type="text/javascript" src="resources/js/jquery/jquery.maskedinput.min.js"></script>
<script type="text/javascript" src="resources/js/jquery/jquery.maskmoney.min.js"></script>
<script type="text/javascript" src="resources/js/jquery/jquery.cookie.js"></script>
<script type="text/javascript" src="resources/css/bootstrap/js/bootstrap.min.js"></script>
<script type="text/javascript" src="resources/js/bootstrap-datepicker/bootstrap-datepicker.js"></script>
<script type="text/javascript" src="resources/js/bootstrap-datepicker/bootstrap-datepicker.pt-BR.js"></script>
<script type="text/javascript" src="resources/js/angular/angular.min.js"></script>
<script type="text/javascript" src="resources/js/angular/angular-route.min.js"></script>
<script type="text/javascript" src="resources/js/angular/angular-locale_pt-br.js"></script>
<script type="text/javascript" src="resources/js/ngTable/ng-table.min.js"></script>
<script type="text/javascript" src="application/directives/directives.js"></script>
<script type="text/javascript" src="application/message/message.js"></script>
<script type="text/javascript" src="application/message/i18n.js"></script>
<script type="text/javascript" src="application/security/security.js"></script>
<script type="text/javascript" src="application/app.js"></script>
<script type="text/javascript" src="application/controllers/AutenticarUsuarioController.js"></script>
<script type="text/javascript" src="application/controllers/ConfigurarValoresMobiliariosController.js"></script>
<script type="text/javascript" src="application/controllers/EnviarFormularioInicialController.js"></script>
<script type="text/javascript" src="application/controllers/EnviarFormularioParcialController.js"></script>
<script type="text/javascript" src="application/controllers/EnviarFormularioEncerramentoController.js"></script>
<script type="text/javascript" src="application/controllers/EnviarComunicadoDispensaMicroEmpresaController.js"></script>
<script type="text/javascript" src="application/controllers/EnviarFormularioDispensaLoteUnicoController.js"></script>
<script type="text/javascript" src="application/controllers/GerenciarEnvioFormulariosController.js"></script>
<script type="text/javascript" src="application/controllers/ConsultarOfertaController.js"></script>
<div class="message" ng-messages=""></div><div class="loader modal in" aria-hidden="false" style="display: block; padding-right: 17px;"><div class="modal-backdrop in" style="height: 672px;"></div><div class="modal-dialog"> <div class="modal-content"><div class="modal-header" style="text-align: center"><h5 class="modal-title">Aguarde</h5></div><div class="modal-body"><div class="row row-mg-1 row-center"><img src="resources/img/ajax-loader.gif" /></div></div></div></div></div></body></html>
我终于解决了这个问题,从浏览器日志中获取信息。数据没有直接出现在 html 源中,但它在过程中使用的 POST 中。
这是最终的工作代码:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import json
import pandas as pd
page = 'http://web.cvm.gov.br/app/esforcosrestritos/#/consultarOferta'
d = DesiredCapabilities.CHROME
d['loggingPrefs'] = { 'performance':'ALL' }
driver = webdriver.Chrome(desired_capabilities=d)
driver.get(page)
## Click on "Encerrada"
driver.find_element_by_xpath('//*[@id="content"]/div[4]/div[2]/div/div /div[4]/div[2]/label[3]/input').click()
## Select year
year = Select(driver.find_element_by_xpath('//*[@id="content"]/div[4]/div[2]/div/div/div[4]/div[1]/div/select'))
year.select_by_visible_text('2017')
## Click on "Pesquisar"
driver.find_element_by_xpath('//*[@id="content"]/div[4]/div[3]/div/a[1]/span').click()
## Click on "DEBENTURES SIMPLES" inside "Ofertas Encerradas"
driver.find_element_by_css_selector('#content > div.container.ng-scope > div:nth-child(4) > div:nth-child(2) > div > table > tbody > tr:nth-child(15) > td.col-lg-2.text-left.ng-binding').click()
## Click on 1st result
driver.find_element_by_css_selector('#content > div.container.ng-scope > div:nth-child(4) > div > div > table > tbody > tr.text-center > td.text-left.ng-binding').click()
## Selenium browser log
performance_log = driver.get_log('performance')
## Find log with allocation information
for j in range(len(performance_log)):
if performance_log[j]['message'].find('Clubes de Investimento') != -1:
break
allocation = performance_log[j]['message']
## Filter allocation data
allocation = allocation.replace('\', '')
allocation = allocation[allocation.find('{"colocacoes":['):]
## Put data into a Pandas DataFrame
allocation_table = pd.DataFrame(columns = ['tipoInvestidor', 'numeroInvestidores', 'quantidadeValorMobiliario'])
slice_allocation = '{"tipoInvestidor":{"id":'
slice_alternative= '{"numeroInvestidores":'
for i in range(1,11):
beginning = allocation.find(slice_allocation+str(i)) if allocation.find(slice_allocation+str(i))!=-1 else allocation.find(slice_alternative)
end = allocation.find(slice_allocation+str(i+1)) if allocation.find(slice_allocation+str(i+1))!=-1 else allocation.find(slice_alternative)
allocation_investor = allocation[beginning:end-1]
allocation = allocation[end:]
allocation_investor = json.loads(allocation_investor)
allocation_investor['tipoInvestidor'] = allocation_investor['tipoInvestidor']['descricao']
allocation_table = allocation_table.append(allocation_investor, ignore_index = True)
allocation_table.fillna(0, inplace = True)
我正在尝试从政府网页获取数据,但是,当我获取页面源时,它不包含浏览器中显示的数据。
from selenium import webdriver
from selenium.webdriver.support.ui import Select
page = 'http://web.cvm.gov.br/app/esforcosrestritos/#/consultarOferta'
driver = webdriver.Chrome()
driver.get(page)
## Click on "Encerrada"
driver.find_element_by_xpath('//*[@id="content"]/div[4]/div[2]/div/div /div[4]/div[2]/label[3]/input').click()
## Select year
year = Select(driver.find_element_by_xpath('//*[@id="content"]/div[4]/div[2]/div/div/div[4]/div[1]/div/select'))
year.select_by_visible_text('2017')
## Click on "Pesquisar"
driver.find_element_by_xpath('//*[@id="content"]/div[4]/div[3]/div/a[1]/span').click()
## Click on "DEBENTURES SIMPLES" inside "Ofertas Encerradas"
driver.find_element_by_css_selector('#content > div.container.ng-scope > div:nth-child(4) > div:nth-child(2) > div > table > tbody > tr:nth-child(15) > td.col-lg-2.text-left.ng-binding').click()
## Click on 1st result
driver.find_element_by_css_selector('#content > div.container.ng-scope > div:nth-child(4) > div > div > table > tbody > tr.text-center > td.text-left.ng-binding').click()
##Page Source
html = driver.page_source
在这个例子中,第一个字段 "CNPJ",而不是得到值 '04.031.960/00001-70',我得到这个:
<input type="text" class="form-control ng-pristine ng-untouched ng-valid ng-valid-maxlength" data-ng-cnpj="" data-ng-model="$responsavel.ofertante.cnpj" data-ng-change="getNomeResponsavelPorCnpj($responsavel.ofertante)" data-ng-disabled="mesmosDadosEmissor || $responsavel.disabled" maxlength="18" disabled="disabled">
此外,如果我将鼠标悬停在浏览器中的值上,则无法 select 它。
有没有办法从这种类型的页面中获取数据?
一旦 click()
在 第一个结果 上,您需要为 Heading[ 引入 WebDriverWait =26=] **** 可见,然后你可以提取 page_source 如下:
代码块:
WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='ng-binding ng-scope'][contains(.,'RIO DE ENCERRAMENTO DE OFERTA P')]"))) ##Page Source print(driver.page_source)
控制台输出:
<!DOCTYPE html><html xmlns="http://www.w3.org/1999/xhtml" lang="pt_br" data-ng-app="app" class="ng-scope"><head><style type="text/css">@charset "UTF-8";[ng\:cloak],[ng-cloak],[data-ng-cloak],[x-ng-cloak],.ng-cloak,.x-ng-cloak,.ng-hide:not(.ng-hide-animate){display:none !important;}ng\:form{display:block;}</style> <meta charset="UTF-8" /> <meta name="viewport" content="width=device-width, initial-scale=1" /> <meta http-equiv="CACHE-CONTROL" content="NO-CACHE" /> <meta http-equiv="EXPIRES" content="Mon, 22 Jul 2002 11:12:01 GMT" /> <title>Sistema Ofertas com Esforços Restritos</title> <link rel="shortcut icon" href="resources/img/favicon.ico" /> <link rel="stylesheet" href="resources/css/open-sans.css" /> <link rel="stylesheet" href="resources/css/bootstrap/css/bootstrap.min.css" /> <link rel="stylesheet" href="resources/css/bootstrap/css/bootstrap-theme.min.css" /> <link rel="stylesheet" href="resources/js/bootstrap-datepicker/datepicker.css" /> <link rel="stylesheet" href="resources/js/ngTable/ng-table.min.css" /> <link rel="stylesheet" href="resources/css/cvm.css" /> </head> <body class="modal-open" style="padding-right: 17px;"> <div id="fullContent"> <div id="content" data-ng-controller="AutenticarUsuarioController" class="ng-scope"> <!-- INICIO MENU BRASIL --> <div class="nav-brasil"> <div class="navbar navbar-default"> <div class="container"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#brasil"> <img src="resources/img/brazil-flag_05.png" /> </button> </div> <!-- Collect the nav links, forms, and other content for toggling --> <div class="collapse navbar-collapse" id="brasil"> <ul class="nav navbar-nav"> <li><a class="icon-brasil" href="http://www.brasil.gov.br/" target="_blank">BRASIL</a></li> <li><a href="http://www.acessoainformacao.gov.br/sistema/" target="_blank">Acesso à informação</a></li> </ul> <ul class="nav navbar-nav navbar-right"> <li class="first-li"><a href="http://brasil.gov.br/barra#participe" target="_blank">Participe</a></li> <li><a href="http://www.servicos.gov.br/" target="_blank">Serviços</a></li> <li><a href="http://www.planalto.gov.br/legislacao" target="_blank">Legislação</a></li> <li><a href="http://brasil.gov.br/barra#orgaos-atuacao-canais" target="_blank">Canais</a></li> </ul> </div><!-- /.navbar-collapse --> </div> </div> </div> <!-- FIM MENU BRASIL --> <!-- INICIO CABEÇALHO --> <div id="header"> <div class="container"> <div class="row"> <div class="col-lg-4"> <h5>CVM - Comissão de Valores Mobiliários</h5> </div> <div class="text-right" data-ng-init="initContraste()"> <a class="h6" href="javascript:void(0)" data-ng-click="altoContraste()">ALTO CONTRASTE</a> </div> </div> <a class="h2" href="javascript:void(0)" data-ng-click="abrirPaginaPrincipal()">Sistemas de Ofertas Públicas com Esforços Restritos</a> <div class="row"> <div class="col-lg-3"> <h5>GOVERNO FEDERAL</h5> </div> <!-- ngIf: temUsuario() --> </div> </div> </div> <!-- FIM CABEÇALHO --> <!-- INICIO MENU PRINCIPAL --> <!-- INICIO MENU PRINCIPAL --> <div class="nav-principal"> <div class="navbar navbar-default"> <!-- ngIf: temUsuario() --> </div> </div> <!-- FIM MENU PRINCIPAL --> <!-- INICIO CONTEÚDO --> <!-- ngView: --><div data-ng-view="" class="container ng-scope"> <div data-ng-init="init()" class="ng-scope"> <div class="row row-title"> <div class="right-title"> <!-- ngIf: acao.isAcaoVisualizar() && permissaoAlteracao --> <!-- ngIf: acao.isAcaoVisualizar() && permissaoAlteracao --> <a class="btn btn-link" href="ajuda/Envio_Formulario_Encerramento.pdf" target="_blank"> <img src="resources/img/ajuda.png" /> <span class="ng-binding">Ajuda</span> </a> </div> <!-- ngIf: acao.isAcaoIncluir() --> <!-- ngIf: acao.isAcaoAlterar() --> <!-- ngIf: acao.isAcaoVisualizar() --><div data-ng-if="acao.isAcaoVisualizar()" class="ng-binding ng-scope">VISUALIZAR FORMULÁRIO DE ENCERRAMENTO DE OFERTA PÚBLICA COM ESFORÇOS RESTRITOS</div><!-- end ngIf: acao.isAcaoVisualizar() --> </div> <div style="min-height: 1200px"> <div class="row row-required ng-binding">* Campos Obrigatórios</div> <!-- ngIf: acao.isAcaoAlterar() && !usuarioGestor --> <div data-ng-responsavel="$responsavel"></div> <div data-ng-oferta="$oferta"></div> <div data-ng-intermediario="$intermediario"></div> <div data-ng-colocacao="$colocacao"></div> </div> <div class="row row-center"> <div class="col-center"> <a class="btn btn-default" role="button" href="javascript:void(0)" data-ng-click="voltar()"> <img src="resources/img/arrow-left.png" /> <span class="ng-binding">Voltar</span> </a> <!-- ngIf: acao.isAcaoIncluir() --> <!-- ngIf: acao.isAcaoAlterar() --> </div> </div> </div></div> <!-- FIM CONTEÚDO --> </div> <!-- INICIO RODAPÉ --> <div id="footer"> <div class="container footer-container"> <div class="row"> <div class="col-lg-8"> <a href="http://www.acessoainformacao.gov.br/sistema/" target="_blank"> <img src="resources/img/logo-acesso_25.png" /> </a> </div> <div class="col-lg-2 text-right cvm-footer-description"> <h6>CVM - Comissão de</h6><h6>Valores Mobiliários</h6> </div> <a href="http://www.brasil.gov.br/"><span class="logo-brasil-footer"></span></a> </div> </div> <div class="version-sistem"> <div class="container"> </div> </div> </div> <!-- FIM RODAPÉ --> </div> <!-- DEPENDÊNCIAS JAVA SCRIPT --> <script type="text/javascript" src="resources/js/jquery/jquery-2.1.3.min.js"></script> <script type="text/javascript" src="resources/js/base64/jquery.base64.min.js"></script> <script type="text/javascript" src="resources/js/jquery/jquery.maskedinput.min.js"></script> <script type="text/javascript" src="resources/js/jquery/jquery.maskmoney.min.js"></script> <script type="text/javascript" src="resources/js/jquery/jquery.cookie.js"></script> <script type="text/javascript" src="resources/css/bootstrap/js/bootstrap.min.js"></script> <script type="text/javascript" src="resources/js/bootstrap-datepicker/bootstrap-datepicker.js"></script> <script type="text/javascript" src="resources/js/bootstrap-datepicker/bootstrap-datepicker.pt-BR.js"></script> <script type="text/javascript" src="resources/js/angular/angular.min.js"></script> <script type="text/javascript" src="resources/js/angular/angular-route.min.js"></script> <script type="text/javascript" src="resources/js/angular/angular-locale_pt-br.js"></script> <script type="text/javascript" src="resources/js/ngTable/ng-table.min.js"></script> <script type="text/javascript" src="application/directives/directives.js"></script> <script type="text/javascript" src="application/message/message.js"></script> <script type="text/javascript" src="application/message/i18n.js"></script> <script type="text/javascript" src="application/security/security.js"></script> <script type="text/javascript" src="application/app.js"></script> <script type="text/javascript" src="application/controllers/AutenticarUsuarioController.js"></script> <script type="text/javascript" src="application/controllers/ConfigurarValoresMobiliariosController.js"></script> <script type="text/javascript" src="application/controllers/EnviarFormularioInicialController.js"></script> <script type="text/javascript" src="application/controllers/EnviarFormularioParcialController.js"></script> <script type="text/javascript" src="application/controllers/EnviarFormularioEncerramentoController.js"></script> <script type="text/javascript" src="application/controllers/EnviarComunicadoDispensaMicroEmpresaController.js"></script> <script type="text/javascript" src="application/controllers/EnviarFormularioDispensaLoteUnicoController.js"></script> <script type="text/javascript" src="application/controllers/GerenciarEnvioFormulariosController.js"></script> <script type="text/javascript" src="application/controllers/ConsultarOfertaController.js"></script> <div class="message" ng-messages=""></div><div class="loader modal in" aria-hidden="false" style="display: block; padding-right: 17px;"><div class="modal-backdrop in" style="height: 672px;"></div><div class="modal-dialog"> <div class="modal-content"><div class="modal-header" style="text-align: center"><h5 class="modal-title">Aguarde</h5></div><div class="modal-body"><div class="row row-mg-1 row-center"><img src="resources/img/ajax-loader.gif" /></div></div></div></div></div></body></html>
我终于解决了这个问题,从浏览器日志中获取信息。数据没有直接出现在 html 源中,但它在过程中使用的 POST 中。
这是最终的工作代码:
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import json
import pandas as pd
page = 'http://web.cvm.gov.br/app/esforcosrestritos/#/consultarOferta'
d = DesiredCapabilities.CHROME
d['loggingPrefs'] = { 'performance':'ALL' }
driver = webdriver.Chrome(desired_capabilities=d)
driver.get(page)
## Click on "Encerrada"
driver.find_element_by_xpath('//*[@id="content"]/div[4]/div[2]/div/div /div[4]/div[2]/label[3]/input').click()
## Select year
year = Select(driver.find_element_by_xpath('//*[@id="content"]/div[4]/div[2]/div/div/div[4]/div[1]/div/select'))
year.select_by_visible_text('2017')
## Click on "Pesquisar"
driver.find_element_by_xpath('//*[@id="content"]/div[4]/div[3]/div/a[1]/span').click()
## Click on "DEBENTURES SIMPLES" inside "Ofertas Encerradas"
driver.find_element_by_css_selector('#content > div.container.ng-scope > div:nth-child(4) > div:nth-child(2) > div > table > tbody > tr:nth-child(15) > td.col-lg-2.text-left.ng-binding').click()
## Click on 1st result
driver.find_element_by_css_selector('#content > div.container.ng-scope > div:nth-child(4) > div > div > table > tbody > tr.text-center > td.text-left.ng-binding').click()
## Selenium browser log
performance_log = driver.get_log('performance')
## Find log with allocation information
for j in range(len(performance_log)):
if performance_log[j]['message'].find('Clubes de Investimento') != -1:
break
allocation = performance_log[j]['message']
## Filter allocation data
allocation = allocation.replace('\', '')
allocation = allocation[allocation.find('{"colocacoes":['):]
## Put data into a Pandas DataFrame
allocation_table = pd.DataFrame(columns = ['tipoInvestidor', 'numeroInvestidores', 'quantidadeValorMobiliario'])
slice_allocation = '{"tipoInvestidor":{"id":'
slice_alternative= '{"numeroInvestidores":'
for i in range(1,11):
beginning = allocation.find(slice_allocation+str(i)) if allocation.find(slice_allocation+str(i))!=-1 else allocation.find(slice_alternative)
end = allocation.find(slice_allocation+str(i+1)) if allocation.find(slice_allocation+str(i+1))!=-1 else allocation.find(slice_alternative)
allocation_investor = allocation[beginning:end-1]
allocation = allocation[end:]
allocation_investor = json.loads(allocation_investor)
allocation_investor['tipoInvestidor'] = allocation_investor['tipoInvestidor']['descricao']
allocation_table = allocation_table.append(allocation_investor, ignore_index = True)
allocation_table.fillna(0, inplace = True)