如何构建通过 google 搜索获取数据的搜索查询?

How do I build a search query that gets me data through google search?

我正在开展一个项目,需要提取与佛罗里达州特定公园有关的数据。例如,我关于此 post 的问题是关于如何对 R 进行编程以通过 google 进行搜索查询以获取区域。当我在 google 搜索中输入 "area of wekiva springs state park in hectares" 时,我从页面顶部得到了一个实际值“2,833 公顷”。现在我有一个包含 52 个公园的列表:

structure(list(`unique(df$ParkName)` = structure(c(14L, 47L, 
39L, 12L, 9L, 20L, 5L, 10L, 25L, 28L, 36L, 30L, 31L, 43L, 4L, 
35L, 44L, 48L, 51L, 6L, 21L, 32L, 38L, 42L, 1L, 41L, 27L, 45L, 
46L, 50L, 18L, 37L, 24L, 26L, 13L, 52L, 15L, 2L, 17L, 11L, 22L, 
34L, 49L, 16L, 40L, 7L, 8L, 29L, 33L, 3L, 23L, 19L), .Label = c("Alafia River State Park", 
"Amelia Island State Park", "Big Cypress National Park", "Big Talbot Island State Park", 
"Bill Baggs Cape Florida State Park", "Blue Spring State Park", 
"Caladesi Island State Park", "Cayo Costa State Park", "Collier-Seminole State Park", 
"Curry Hammock State Park", "Dade Battlefield Historic State Park", 
"De Leon Springs State Park", "Delanor-Wiggins Pass State Park", 
"Fakahatchee Strand Preserve State Park", "Faver-Dykes State Park", 
"Fort Cooper State Park", "Fort George Island Cultural State Park", 
"Fort Pierce Inlet State Park/Avalon State Park", "Fort Zachary Taylor Historic State Park", 
"Highlands Hammock State Park", "Hillsborough River State Park", 
"Honeymoon Island State Park", "Hugh Taylor Birch State Park", 
"John D. MacArthur Beach State Park", "John Pennekamp Coral Reef State Park/Key Largo Hammocks", 
"John U. Lloyd Beach State Park", "Jonathan Dickinson State Park", 
"Key Largo Hammocks", "Koreshan State Historic Site", "Lake Griffin State Park", 
"Lake Kissimmee State Park", "Lake Manatee State Park", "Lake Wales Ridge Geopark", 
"Little Manatee River State Park", "Little Talbot Island State Park", 
"Long Key State Park", "Lovers Key State Park", "Myakka River State Park", 
"Ocala National Forest", "Oleta River State Park", "Oscar Scherer State Park", 
"Paynes Creek Historic State Park", "Paynes Prairie Preserve State Park", 
"Pumpkin Hill Creek Preserve State Park", "Savannas Preserve State Park", 
"Seabranch Preserve State Park", "Sebastian Inlet State Park", 
"Talbot Islands State Parks", "Terra Ceia Preserve State Park", 
"Tosohatchee Wildlife Management Area", "Washington Oaks Gardens State Park", 
"Werner-Boyce Salt Springs State Park"), class = "factor")), .Names = "unique(df$ParkName)", row.names = c(NA, 
-52L), class = "data.frame")

我可以在 google 搜索栏中手动输入每个公园名称,但我真的很想弄清楚如何为此构建搜索查询,以便将其应用于未来的项目。问题是在构建如此复杂的任何东西时我有点不知所措。我最近才开始了解 "APIs" 之类的东西

如有任何帮助,我们将不胜感激。

要使用 rvest 包进行网络抓取,结果在很大程度上取决于每个查询,因为并非所有查询都可以 return 页面顶部的值。

library(rvest)


 parks <- data.frame(name = c("wekiva springs state park", "cayo costa 
                 state park"))

  url  <- "http://www.google.com"

  s <- html_session(url)
  search <- html_form(s)[[1]]
  for(i in 1:dim(parks)[1]){
    query <- paste("area of",parks[i,1], "in hectares")
    a <- set_values(search, q = query)

    session <- submit_form(s, a) 
    s1 <- html_nodes(session, "#res")
    result <- html_text(s1)

    parks$area[i] <- gsub("([A-Za-z]+).*", "\1", result)
  }

  parks

                    name     area
1 wekiva springs state park 2.833 ha
2     cayo costa state park 1.014 ha 

了解 rvest,here's 一个很好的起点