使用 Selenium 在地图上抓取多个项目位置

Using RSelenium to webscrape multiple item locations on a map

我正在尝试对地图进行网络抓取,以便下载 Street Lighting 图层中的所有位置。我使用 RSelenium 来获取数据:

library(tidyverse)
library(rvest)
library(RSelenium)

# Open a browser
rD <- rsDriver(browser="firefox", port=4545L, verbose=F)
remDr <- rD[["client"]]
# Navigate to site
remDr$navigate("https://gis2.westberks.gov.uk/webapps/OnlineMap/")

此时,我通过浏览器打开 Street Lighting 图层(在 Highways 下),然后 select 在地图上打开一个 Street Light。如果我然后 运行:

h <- read_html(remDr$getPageSource()[[1]]) %>% html_nodes(".attrTable") %>% html_table()

我得到了那盏路灯的数据。但是,我想获取地图上显示的 all 街道照明的数据。我不知道该怎么做。是否可以在运行宁remDr$getPageSource()之前以编程方式select地图上的所有灯光?

我看过这个 post,但并没有完全解决问题: Issue scraping website with reactive blocks

它将图层显示为单个图像,您无法获取位置。
它需要一些 Computer Vision 来检测图像上的圆圈。

当您点击地图时,它会将坐标发送到服务器
并发回 JSON data 并显示为弹出 window.

它发送这样的东西(有一些值xmin, xmax, ymin, ymax

https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query?f=json&returnGeometry=true&spatialRel=esriSpatialRelIntersects&geometry=%7B%22xmin%22%3A442520.89976831735%2C%22ymin%22%3A178788.66371417744%2C%22xmax%22%3A443314.65135582053%2C%22ymax%22%3A179582.41530168062%2C%22spatialReference%22%3A%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D%7D&geometryType=esriGeometryEnvelope&inSR=27700&outFields=OBJECTID%2CItem_Type%2CItem_Identity_Code%2CLocation_Description%2CAssigned_Street%2CLocality%2CTown%2CType%2CBracket_Type%2CLantern_Type%2CLamp_Type%2CBallast_Type%2CControl_Type%2CSign_Lantern_Type%2CSign_Bracket_Type%2CSign_Post_Type%2CBollard_Base_Type%2CBollard_Shell_Type%2CColumn_Manufacturer%2CMaterial_Type%2CLamp_Wattage%2CLantern_Manufacturer%2CNumber_of_Lamps%2CSwitching_Regime_Code%2CSwitching_Regime%2CLamp_Type2%2CEasting%2CNorthing&outSR=27700

(可以点击link查看JSON数据)

也许如果您将它与 xmin, xmax, ymin, ymax 一起用于更大的区域,那么您将获得所有值。


编辑:

我没有 R 方面的经验,但我可以在 Python 中展示示例。

它不需要 Selenium(在 PythonR 中)。

import requests

# full url with parameters
#url = 'https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query?f=json&returnGeometry=true&spatialRel=esriSpatialRelIntersects&geometry=%7B%22xmin%22%3A442520.89976831735%2C%22ymin%22%3A178788.66371417744%2C%22xmax%22%3A443314.65135582053%2C%22ymax%22%3A179582.41530168062%2C%22spatialReference%22%3A%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D%7D&geometryType=esriGeometryEnvelope&inSR=27700&outFields=OBJECTID%2CItem_Type%2CItem_Identity_Code%2CLocation_Description%2CAssigned_Street%2CLocality%2CTown%2CType%2CBracket_Type%2CLantern_Type%2CLamp_Type%2CBallast_Type%2CControl_Type%2CSign_Lantern_Type%2CSign_Bracket_Type%2CSign_Post_Type%2CBollard_Base_Type%2CBollard_Shell_Type%2CColumn_Manufacturer%2CMaterial_Type%2CLamp_Wattage%2CLantern_Manufacturer%2CNumber_of_Lamps%2CSwitching_Regime_Code%2CSwitching_Regime%2CLamp_Type2%2CEasting%2CNorthing&outSR=27700'

# only parameters
params = {
    'f': ['json'],
    'geometry': [
         '{"xmin":442520.89976831735,"ymin":178788.66371417744,"xmax":443314.65135582053,"ymax":179582.41530168062,"spatialReference":{"wkid":27700,"latestWkid":27700}}'
    ],
    'geometryType': ['esriGeometryEnvelope'],
    'inSR': ['27700'],
    'outFields': ['OBJECTID,Item_Type,Item_Identity_Code,Location_Description,Assigned_Street,Locality,Town,Type,Bracket_Type,Lantern_Type,Lamp_Type,Ballast_Type,Control_Type,Sign_Lantern_Type,Sign_Bracket_Type,Sign_Post_Type,Bollard_Base_Type,Bollard_Shell_Type,Column_Manufacturer,Material_Type,Lamp_Wattage,Lantern_Manufacturer,Number_of_Lamps,Switching_Regime_Code,Switching_Regime,Lamp_Type2,Easting,Northing'],
    'outSR': ['27700'],
    'returnGeometry': ['true'],
    'spatialRel': ['esriSpatialRelIntersects']
}

# url without parameters
url = 'https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query'

response = requests.get(url, params=params)
#print(response.url)
#print(response.status_code)

data = response.json()

for item in data['features']:
    print('Locality:', item['attributes']['Locality'].strip())
    print('Town    :', item['attributes']['Town'].strip())
    print('Street  :', item['attributes']['Assigned_Street'].strip())
    print('Geometry:', item['geometry'])
    print('---')

结果:

Locality: BRIGHTWALTON
Town    : NEWBURY
Street  : SAXONS ACRE
Geometry: {'x': 442763, 'y': 179193}
---
Locality: BRIGHTWALTON
Town    : NEWBURY
Street  : ASH CLOSE
Geometry: {'x': 442782, 'y': 179248}
---
Locality: BRIGHTWALTON
Town    : NEWBURY
Street  : SAXONS ACRE
Geometry: {'x': 442770, 'y': 179214}
---

编辑:

版本在 R

> install.packages("jsonlite")

> library(jsonlite)

> URL = 'https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query?f=json&returnGeometry=true&spatialRel=esriSpatialRelIntersects&geometry=%7B%22xmin%22%3A442520.89976831735%2C%22ymin%22%3A178788.66371417744%2C%22xmax%22%3A443314.65135582053%2C%22ymax%22%3A179582.41530168062%2C%22spatialReference%22%3A%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D%7D&geometryType=esriGeometryEnvelope&inSR=27700&outFields=OBJECTID%2CItem_Type%2CItem_Identity_Code%2CLocation_Description%2CAssigned_Street%2CLocality%2CTown%2CType%2CBracket_Type%2CLantern_Type%2CLamp_Type%2CBallast_Type%2CControl_Type%2CSign_Lantern_Type%2CSign_Bracket_Type%2CSign_Post_Type%2CBollard_Base_Type%2CBollard_Shell_Type%2CColumn_Manufacturer%2CMaterial_Type%2CLamp_Wattage%2CLantern_Manufacturer%2CNumber_of_Lamps%2CSwitching_Regime_Code%2CSwitching_Regime%2CLamp_Type2%2CEasting%2CNorthing&outSR=27700'

> data <- fromJSON(URL)

> library(magrittr)  # to use `%>%
> data <- URL %>% fromJSON

> data$features$attributes$Locality

[1] "BRIGHTWALTON                                                                                        "
[2] "BRIGHTWALTON                                                                                        "
[3] "BRIGHTWALTON                                                                                        "

> data$features$attributes$Town

[1] "NEWBURY                                                                                             "
[2] "NEWBURY                                                                                             "
[3] "NEWBURY                                                                                             "

> data$features$attributes$Assigned_Street

[1] "SAXONS ACRE                                                                                         "
[2] "ASH CLOSE                                                                                           "
[3] "SAXONS ACRE                                                                                         "

> data$features$geometry

       x      y
1 442763 179193
2 442782 179248
3 442770 179214

> library(stringr)

> data$features$attributes$Locality %>% str_trim

[1] "BRIGHTWALTON" "BRIGHTWALTON" "BRIGHTWALTON"

编辑:

类似于Python版本

> library(magrittr)  # to use `%>%
> library(httr)
> library(jsonlite)

> URL = 'https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query'

> query = list(
    f = list('json'),
    geometry = list(
         '{"xmin":442520.89976831735,"ymin":178788.66371417744,"xmax":443314.65135582053,"ymax":179582.41530168062,"spatialReference":{"wkid":27700,"latestWkid":27700}}'
    ),
    geometryType = list('esriGeometryEnvelope'),
    inSR = list('27700'),
    outFields = list('OBJECTID,Item_Type,Item_Identity_Code,Location_Description,Assigned_Street,Locality,Town,Type,Bracket_Type,Lantern_Type,Lamp_Type,Ballast_Type,Control_Type,Sign_Lantern_Type,Sign_Bracket_Type,Sign_Post_Type,Bollard_Base_Type,Bollard_Shell_Type,Column_Manufacturer,Material_Type,Lamp_Wattage,Lantern_Manufacturer,Number_of_Lamps,Switching_Regime_Code,Switching_Regime,Lamp_Type2,Easting,Northing'),
    outSR = list('27700'),
    returnGeometry = list('true'),
    spatialRel = list('esriSpatialRelIntersects')
  )

> response <- GET(URL, query=query)
> data <- response %>% content %>% fromJSON

> data <- GET(URL, query=query) %>% content %>% fromJSON

> items <- data$features