RSelenium 和 rvest - 只获取选中复选框的数据

RSelenium and rvest - only getting data for selected check boxes

我可以使用以下方法访问网页:

Data/code:

library(RSelenium)
library(rvest)
library(tidyverse)

rD <- rsDriver(browser="firefox", port=4536L)
remDr <- rD[["client"]]

zona_url_to_get = "https://www.fotocasa.es/es/comprar/viviendas/barcelona-capital/eixample/l"

remDr$navigate(zona_url_to_get)


# accept cookies 
remDr$findElement(using = "xpath",'/html/body/div[1]/div[4]/div/div/div/footer/div/button[2]')$clickElement()
#click on Distrito
remDr$findElement(using = "xpath", '/html/body/div[1]/div[2]/div[1]/div[3]/div/div[1]/div')$clickElement()


html_zona_full_page = remDr$getPageSource()[[1]] %>% 
  read_html()

这会打开页面,接受 cookie,单击下拉菜单并从页面中读取 HTML。

然后我可以使用以下内容:

Zonas_Names = html_zona_full_page %>% 
  html_nodes('.re-GeographicSearchNext-checkboxItem')

给我:

{xml_nodeset (16)}
 [1] <a class="re-GeographicSearchNext-checkboxItem re-GeographicSearchNext-checkboxItem--has-separator" title="Ciutat Vella" href="/es/c ...
 [2] <a class="re-GeographicSearchNext-checkboxItem is-checked re-GeographicSearchNext-checkboxItem--has-separator" title="Eixample" href ...
 [3] <a class="re-GeographicSearchNext-checkboxItem is-checked" title="Dreta de l'Eixample" href="/es/comprar/viviendas/barcelona-capital ...
 [4] <a class="re-GeographicSearchNext-checkboxItem is-checked" title="Fort Pienc" href="/es/comprar/viviendas/barcelona-capital/fort-pie ...
 [5] <a class="re-GeographicSearchNext-checkboxItem is-checked" title="La Nova Esquerra de l'Eixample" href="/es/comprar/viviendas/barcel ...
 [6] <a class="re-GeographicSearchNext-checkboxItem is-checked" title="L'Antiga Esquerra de l'Eixample" href="/es/comprar/viviendas/barce ...
 [7] <a class="re-GeographicSearchNext-checkboxItem is-checked" title="Sagrada Família" href="/es/comprar/viviendas/barcelona-capital/sag ...
 [8] <a class="re-GeographicSearchNext-checkboxItem is-checked" title="Sant Antoni" href="/es/comprar/viviendas/barcelona-capital/sant-an ...
 [9] <a class="re-GeographicSearchNext-checkboxItem re-GeographicSearchNext-checkboxItem--has-separator" title="Gràcia" href="/es/comprar ...
[10] <a class="re-GeographicSearchNext-checkboxItem re-GeographicSearchNext-checkboxItem--has-separator" title="Horta - Guinardó" href="/ ...
[11] <a class="re-GeographicSearchNext-checkboxItem re-GeographicSearchNext-checkboxItem--has-separator" title="Les Corts" href="/es/comp ...
[12] <a class="re-GeographicSearchNext-checkboxItem re-GeographicSearchNext-checkboxItem--has-separator" title="Nou Barris" href="/es/com ...
[13] <a class="re-GeographicSearchNext-checkboxItem re-GeographicSearchNext-checkboxItem--has-separator" title="Sant Andreu" href="/es/co ...
[14] <a class="re-GeographicSearchNext-checkboxItem re-GeographicSearchNext-checkboxItem--has-separator" title="Sant Martí" href="/es/com ...
[15] <a class="re-GeographicSearchNext-checkboxItem re-GeographicSearchNext-checkboxItem--has-separator" title="Sants - Montjuïc" href="/ ...
[16] <a class="re-GeographicSearchNext-checkboxItem re-GeographicSearchNext-checkboxItem--has-separator" title="Sarrià - Sant Gervasi" hr 

但是,我对所有信息都不感兴趣,只对网页上选择的项目(或旁边打勾的项目)感兴趣。它们对应如下:

<a class="re-GeographicSearchNext-checkboxItem is-checked" title="Dreta de l'Eixample"...
<a class="re-GeographicSearchNext-checkboxItem is-checked" title="Fort Pienc"...
<a class="re-GeographicSearchNext-checkboxItem is-checked" title="La Nova Esquerra de l'Eixample"...

... etc.

我的问题是,我怎样才能只保留列表中勾选的项目?

我认为以下可能有效,因为它包含 is-checked 部分,但它 returns a xml_nodeset 0:

> html_zona_full_page %>% 
+   html_nodes('.re-GeographicSearchNext-checkboxItem is-checked')
{xml_nodeset (0)}

我可以运行:

html_zona_full_page %>% 
  html_nodes('.re-GeographicSearchNext-checkboxItem') %>% 
  html_nodes('.re-GeographicSearchNext-checkboxItem-literal')

这给了我:

{xml_nodeset (16)}
 [1] <span class="re-GeographicSearchNext-checkboxItem-literal">Ciutat Vella</span>
 [2] <span class="re-GeographicSearchNext-checkboxItem-literal">Eixample</span>
 [3] <span class="re-GeographicSearchNext-checkboxItem-literal">Dreta de l'Eixample</span>
 [4] <span class="re-GeographicSearchNext-checkboxItem-literal">Fort Pienc</span>
 [5] <span class="re-GeographicSearchNext-checkboxItem-literal">La Nova Esquerra de l'Eixample</span>
 [6] <span class="re-GeographicSearchNext-checkboxItem-literal">L'Antiga Esquerra de l'Eixample</span>
 [7] <span class="re-GeographicSearchNext-checkboxItem-literal">Sagrada Família</span>
 [8] <span class="re-GeographicSearchNext-checkboxItem-literal">Sant Antoni</span>
 [9] <span class="re-GeographicSearchNext-checkboxItem-literal">Gràcia</span>
[10] <span class="re-GeographicSearchNext-checkboxItem-literal">Horta - Guinardó</span>
[11] <span class="re-GeographicSearchNext-checkboxItem-literal">Les Corts</span>
[12] <span class="re-GeographicSearchNext-checkboxItem-literal">Nou Barris</span>
[13] <span class="re-GeographicSearchNext-checkboxItem-literal">Sant Andreu</span>
[14] <span class="re-GeographicSearchNext-checkboxItem-literal">Sant Martí</span>
[15] <span class="re-GeographicSearchNext-checkboxItem-literal">Sants - Montjuïc</span>
[16] <span class="re-GeographicSearchNext-checkboxItem-literal">Sarrià - Sant Gervasi</span>

但我对 Ciutat VellaGràciaHorta ... Sarrià - Sant Gervasi 不感兴趣,因为它们没有在网页上打勾。

归根结底,我只对:

感兴趣
c("Dreta de l'Eixample", "Fort Pienc", "La Nova Esquerra de l'Eixample", "L'Antiga Esquerra de l'Eixample", "Sagrada Família", "Sant Antoni")

我们可以使用.连接两个元素

Zonas_Names = html_zona_full_page %>% 
  html_nodes('.re-GeographicSearchNext-checkboxItem.is-checked')

-输出

> Zonas_Names
{xml_nodeset (7)}
[1] <a class="re-GeographicSearchNext-checkboxItem is-checked re-GeographicSearchNext-checkboxItem--has-separator" title="Eixample" href="/es/comprar/viviendas/barcelona-capi ...
[2] <a class="re-GeographicSearchNext-checkboxItem is-checked" title="Dreta de l'Eixample" href="/es/comprar/viviendas/barcelona-capital/dreta-de-l-eixample/l"><div class="su ...
[3] <a class="re-GeographicSearchNext-checkboxItem is-checked" title="Fort Pienc" href="/es/comprar/viviendas/barcelona-capital/fort-pienc/l"><div class="sui-MoleculeCheckbox ...
[4] <a class="re-GeographicSearchNext-checkboxItem is-checked" title="La Nova Esquerra de l'Eixample" href="/es/comprar/viviendas/barcelona-capital/la-nova-esquerra-de-l-eixa ...
[5] <a class="re-GeographicSearchNext-checkboxItem is-checked" title="L'Antiga Esquerra de l'Eixample" href="/es/comprar/viviendas/barcelona-capital/l-antiga-esquerra-de-l-ei ...
[6] <a class="re-GeographicSearchNext-checkboxItem is-checked" title="Sagrada Família" href="/es/comprar/viviendas/barcelona-capital/sagrada-familia/l"><div class="sui-Molecu ...
[7] <a class="re-GeographicSearchNext-checkboxItem is-checked" title="Sant Antoni" href="/es/comprar/viviendas/barcelona-capital/sant-antoni/l"><div class="sui-MoleculeCheckb ...

对应于点击的