R 如何使用 RSelenium 从 StockTwits 收集数据?

R How to web scrap data from StockTwits with RSelenium?

我想从平台 StockTwits 上发布的推文中获取一些信息。 您可以在此处查看示例推文:https://stocktwits.com/Kndihopefull/message/433815546
我想阅读以下信息:回复数、转发数、点赞数:

我认为 RSelenium-package 可以做到这一点。但是,我的方法并没有真正取得任何进展。 有人可以帮助我吗?

library(RSelenium)

url<- "https://stocktwits.com/Kndihopefull/message/433815546"

# RSelenium with Firefox
rD <- RSelenium::remoteDriver(browser="firefox", port=4546L)
remDr <- rD[["client"]]
remDr$navigate(url)
Sys.sleep(4)

# get the page source
web <- remDr$getPageSource()
web <- xml2::read_html(web[[1]])

我想要一个列表(或数据集)作为结果,如下所示:

$Reply
[1] 1

$Reshare
[1] 1

$Like
[1] 7

非常感谢!

要获得所需的信息,我们可以做到,

library(rvest)
library(dplyr)
library(RSelenium)
#launch browser
driver = rsDriver(browser = c("firefox"))
url = "https://stocktwits.com/ArcherUS/message/434172145"

remDr <- driver$client
remDr$navigate(url)


#First we shall get the tags

remDr$getPageSource()[[1]] %>% 
  read_html() %>% html_nodes('.st_3kvJrBm') %>% 
  html_attr('title') 
[1] "Reply"   "Reshare" "Like"    "Share"   "Search" 

#then the number associated with it
remDr$getPageSource()[[1]] %>% 
  read_html() %>% html_nodes('.st_3kvJrBm') %>% 
  html_text()
[1] ""  ""  "2" ""  "" 

最后两项 ShareSearch 将为空。

更快的方法是使用 rvest

library(rvest)
url = "https://stocktwits.com/ArcherUS/message/434172145"

url %>% 
  read_html() %>% html_nodes('.st_3kvJrBm') %>% 
  html_attr('title') 

url %>% 
  read_html() %>% html_nodes('.st_3kvJrBm') %>% 
  html_text()