通过首先浏览 JavaScript 模块在 R 中进行 Web 抓取

Web scraping in R by first navigating through a JavaScript module

我查找了各种问题和答案,但不幸的是 none 我发现的问题涉及与我类似的案例。在一个典型的问题中,JavaScript table 在网站加载时直接建立。然而,就我而言,在获得所需结果之前,我首先必须浏览 JavaScript 模块和 select 几个标准。

这是我的情况:我必须从这个网站 www.globocambio.co. To do that, I have (1) to navigate to “I WANT COLOMBIAN PESO”, (2) select the currency (e.g., “Chilean Peso”), (3) and the collection destination (e.g., “El Dorado International Airport”). Only then the respective exchange rate is being loaded. See this screenshot 上抓取各种货币的汇率以供说明。我将三个 selection 步骤标记为红色。绿色是我要为不同货币抓取的数据点。

我对JavaScript不是很熟悉,但我试图了解发生了什么。这是我发现的:

  1. 使用 Chrome DevTools,我在加载汇率时调查了网络 activity。有一个名为“GetPrice”的 XHR,它使用 URL: https://reservations.globocambio.co/DesktopModules/GlobalExchange/API/Widget/GetPrice 并使用以下表单数据请求价格 ISOAOrigen=CLP&cantidadOrigen=9000&ISOADestino=COP&cantidadDestino=0&centerId=27&operationType=OperationTypesBuying
  2. 我了解表单数据包含我最初 select 手动输入的信息:
    1. operationType=OperationTypesBuying:这是“我想要哥伦比亚比索”选项
    2. ISOAOrigen=CLP: 这是“智利比索”
    3. centerId=27: 这里是“埃尔多拉多国际机场”
  3. 服务器用以下信息响应我的请求:

    {“MonedaOrigen":{"ISOA":"CLP","Nombre":null,"Margen":0.1630000000,"Tramo":0.0,"Fixing":2.9000000000},"CantidadOrigen":9000.00,"MonedaDestino":{"ISOA":"COP","Nombre":null,"Margen":0.0,"Tramo":0.0,"Fixing":0.0},"CantidadDestino":21845.70,"TipoCambio":2.42730000000000000000,"MargenOrigen":0.0,"TramoOrigen":0.0,"FixingOrigen":0.0,"MargenDestino":0.0,"TramoDestino":0.0,"FixingDestino":0.0,"IdCentro":"27","Comision":null,"ComisionTramoSuperior":null,"ComisionAplicada":{"CodigoMoneda":null,"CodigoTipoMoneda":0,"ComisionFija":0.0,"ComisionVariable":0.0,"TramoInicio":0.0,"TramoFin":null,"Orden”:0}}

  4. 从这个响应中,"TipoCambio":2.42730000000000000000 正在使用这行 HTML 代码写入网站:<span id="spTipoCambioCompra">2.427300</span>

  5. 这意味着 "TipoCambio" 是我要查找的值。

因此,我必须使用表单数据作为输入变量通过 R 以某种方式与服务器进行通信。谁能告诉我该怎么做? 我的意思是,明白我必须以某种方式将 URL https://reservations.globocambio.co/DesktopModules/GlobalExchange/API/Widget/GetPrice 与表单数据“ISOAOrigen=CLP&cantidadOrigen=9000&ISOADestino=COP&cantidadDestino=0&centerId=27&operationType=OperationTypesBuying” 结合起来,但我不知道它是如何工作的..

任何帮助将不胜感激!

更新:

我还不知道如何解决上述问题。但是,我尝试以小步骤接近它。 使用 RSelenium,我目前正在尝试了解如何单击选项 “我想要哥伦比亚比索”。我的想法是使用以下代码:

library(RSelenium)
remDr <- RSelenium::remoteDriver(remoteServerAddr = "localhost",
                                 port = 4445L,
                                 browserName = "chrome")
remDr$open()
remDr$navigate("https://www.globocambio.co/en/home")
webElem <- remDr$findElement("id", "tabCompra") #What is wrong here?
webElem$clickElement() # Click on "I WANT COLOMBIAN PESO"

但是我在执行 webElem <- remDr$findElement("id", "tabCompra") 后收到错误消息:

Selenium message:no such element: Unable to locate element: {"method":"css selector","selector":"#tabCompra"} (Session info: chrome=81.0.4044.113) For documentation on this error, please visit: https://www.seleniumhq.org/exceptions/no_such_element.html ... Error: Summary: NoSuchElement Detail: An element could not be located on the page using the given search parameters. class: org.openqa.selenium.NoSuchElementException Further Details: run errorDetails method

我做错了什么?

我在 Python 中使用 selenium 解决了我的问题:

from selenium import webdriver
driver = webdriver.Firefox(executable_path = '/your_path/geckodriver')

driver.get("https://www.globocambio.co/en/")
driver.switch_to.frame("iframeWidget");

elem = driver.find_element_by_id('tabCompra')
elem.click()

elem = driver.find_element_by_id('inputddlMonedaOrigenCompra')
elem.click()
elem.send_keys(Keys.CLEAR)
elem.send_keys("Chilean Peso")
elem.send_keys(Keys.ENTER)
elem.send_keys(Keys.ARROW_DOWN)
elem.send_keys(Keys.RETURN)

elem = driver.find_element_by_id('info-change-compra')
print(elem.text)