R 中的网页抓取搜索结果

webscraping search results in R

我是网络抓取的新手,我正在尝试抓取网站内搜索功能产生的一些数据。我正在使用 rvest 提取信息,但没有得到结果。这是网站:

https://www.encompassinsurance.com/agency-locator.aspx#PostalCode=30350&City=&StateProvCd=&Latitude=&Longitude=

这就是我 运行宁:

URL <- 'https://www.encompassinsurance.com/agency-locator.aspx#PostalCode=21403&City=&StateProvCd=&Latitude=&Longitude='

webpage <- read_html(URL)

name_html <- html_nodes(webpage,'.locator_result_name')

name_data <- html_text(name_html)

当我 运行 这段代码时,我得到的响应是: 字符(0)

我希望响应是每个公司的名称作为邮政编码搜索的结果(例如“Townley-Kenton Insurance Agency”、“Bradford Turner Insurance Group LLC”)。

我知道此页面上有一些 Javascript,我可能遗漏了一个重要的部分,但鉴于我对 html、CSS、javascript 的了解有限我不确定如何应用 V8 或 PhantomJS 来完成这项工作。

感谢任何帮助。

确实使用 javascript 动态获取数据(通过 XHR GET 请求)。但是,可以使用 httr 包直接从 R 发送此请求。它 returns 一个 JSON 字符串,很容易用 jsonlite.

解析

几乎所有您想抓取的信息都在数据框中 Info$OfficeInfo:

library(httr)
library(jsonlite)

res <- content(GET(paste0("https://alr.encompassinsurance.com/",
                          "?PostalCode=30350&City=&StateProvCd=",
                          "&Latitude=&Longitude=")), "text")
info <- fromJSON(res)

info$OfficeInfo$Name
#>  [1] "Townley-Kenton Insurance Agency"                          
#>  [2] "Bradford Turner Insurance Group LLC"                      
#>  [3] "Arthur J Gallagher Risk Management Services, Inc."        
#>  [4] "Lanigan Insurance Group Inc"                              
#>  [5] "Haven Insurance Group"                                    
#>  [6] "The Leavitt Insurance Group of Atlanta, Incorporated"     
#>  [7] "Findley Insurance Agency Inc"                             
#>  [8] "Grimes Insurance Agency Inc"                              
#>  [9] "Larry L Talbert Ins Agency DBA Talbert Insurance Services"
#> [10] "The Alliance Group, Inc."                                 
#> [11] "Concierge Insurance Group LLC"                            
#> [12] "Sutter McLellan & Gilbreath Inc"                          
#> [13] "The Wichalonis Insurance Agency"                          
#> [14] "The Beck Agency"                                          
#> [15] "USI Insurance Services LLC"                               
#> [16] "The Insurance Store"                                      
#> [17] "Southern Insurance Associates of Dunwoody"                
#> [18] "D.C.J.D. Corporation DBA The Markey Insurance Group"      
#> [19] "DM Services, Incorporated"                                
#> [20] "Southern Insurance Advisors"                              
#> [21] "Metro Brokers Insurance Services"                         
#> [22] "1 Source Insurance, LLC"                                  
#> [23] "The Bates Agency II, LLC"                                 
#> [24] "Risk & Insurance Consultants Inc"                         
#> [25] "Integrity Insurance & Financial Services Inc"             
#> [26] "HN Insurance Services Inc"                                
#> [27] "Norton Metro LLC"                                         
#> [28] "The Nsure Network LLC"                                    
#> [29] "Henssler Norton Insurance LLC"                            
#> [30] "Brown & Brown Insurance of Georgia"                       
#> [31] "America Insurance Brokers, Inc. DBA AIB"                  
#> [32] "Clear View Insurance Agency"                              
#> [33] "Relation Insurance Services"                              
#> [34] "Partners Risk Services LLC"                               
#> [35] "PointeNorth Insurance Group LLC"                          
#> [36] "Advanced Insurors Inc"                                    
#> [37] "Mcever & Tribble, Inc."                                   
#> [38] "The Bethea Insurance Group, LLC"                          
#> [39] "Watchko - Young Ins Agcy Inc"                             
#> [40] "Sterling Seacrest Partners Inc"                           
#> [41] "Little & Smith, Incorporated"                             
#> [42] "LMG Insurance Services Inc"                               
#> [43] "Granite Risk Advisors LLC"                                
#> [44] "Mountain Lakes Insurance, LLC"                            
#> [45] "Hutchinson Traylor Insurance"                             
#> [46] "Edgewood Partners Insurance Center"                       
#> [47] "ADC Agency"                                               
#> [48] "MLG Insurance & Financial Services"                       
#> [49] "Burnette Insurance Agency"                                
#> [50] "Campbell and Company Enterprise, Incorporated"

reprex package (v0.3.0)

于 2020-08-19 创建