使用 Selenium 在地图上抓取多个项目位置
Using RSelenium to webscrape multiple item locations on a map
我正在尝试对地图进行网络抓取,以便下载 Street Lighting 图层中的所有位置。我使用 RSelenium
来获取数据:
library(tidyverse)
library(rvest)
library(RSelenium)
# Open a browser
rD <- rsDriver(browser="firefox", port=4545L, verbose=F)
remDr <- rD[["client"]]
# Navigate to site
remDr$navigate("https://gis2.westberks.gov.uk/webapps/OnlineMap/")
此时,我通过浏览器打开 Street Lighting 图层(在 Highways 下),然后 select 在地图上打开一个 Street Light。如果我然后 运行:
h <- read_html(remDr$getPageSource()[[1]]) %>% html_nodes(".attrTable") %>% html_table()
我得到了那盏路灯的数据。但是,我想获取地图上显示的 all 街道照明的数据。我不知道该怎么做。是否可以在运行宁remDr$getPageSource()
之前以编程方式select地图上的所有灯光?
我看过这个 post,但并没有完全解决问题:
Issue scraping website with reactive blocks
它将图层显示为单个图像,您无法获取位置。
它需要一些 Computer Vision
来检测图像上的圆圈。
当您点击地图时,它会将坐标发送到服务器
并发回 JSON data
并显示为弹出 window.
它发送这样的东西(有一些值xmin, xmax, ymin, ymax
)
(可以点击link查看JSON数据)
也许如果您将它与 xmin, xmax, ymin, ymax
一起用于更大的区域,那么您将获得所有值。
编辑:
我没有 R
方面的经验,但我可以在 Python
中展示示例。
它不需要 Selenium(在 Python
和 R
中)。
import requests
# full url with parameters
#url = 'https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query?f=json&returnGeometry=true&spatialRel=esriSpatialRelIntersects&geometry=%7B%22xmin%22%3A442520.89976831735%2C%22ymin%22%3A178788.66371417744%2C%22xmax%22%3A443314.65135582053%2C%22ymax%22%3A179582.41530168062%2C%22spatialReference%22%3A%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D%7D&geometryType=esriGeometryEnvelope&inSR=27700&outFields=OBJECTID%2CItem_Type%2CItem_Identity_Code%2CLocation_Description%2CAssigned_Street%2CLocality%2CTown%2CType%2CBracket_Type%2CLantern_Type%2CLamp_Type%2CBallast_Type%2CControl_Type%2CSign_Lantern_Type%2CSign_Bracket_Type%2CSign_Post_Type%2CBollard_Base_Type%2CBollard_Shell_Type%2CColumn_Manufacturer%2CMaterial_Type%2CLamp_Wattage%2CLantern_Manufacturer%2CNumber_of_Lamps%2CSwitching_Regime_Code%2CSwitching_Regime%2CLamp_Type2%2CEasting%2CNorthing&outSR=27700'
# only parameters
params = {
'f': ['json'],
'geometry': [
'{"xmin":442520.89976831735,"ymin":178788.66371417744,"xmax":443314.65135582053,"ymax":179582.41530168062,"spatialReference":{"wkid":27700,"latestWkid":27700}}'
],
'geometryType': ['esriGeometryEnvelope'],
'inSR': ['27700'],
'outFields': ['OBJECTID,Item_Type,Item_Identity_Code,Location_Description,Assigned_Street,Locality,Town,Type,Bracket_Type,Lantern_Type,Lamp_Type,Ballast_Type,Control_Type,Sign_Lantern_Type,Sign_Bracket_Type,Sign_Post_Type,Bollard_Base_Type,Bollard_Shell_Type,Column_Manufacturer,Material_Type,Lamp_Wattage,Lantern_Manufacturer,Number_of_Lamps,Switching_Regime_Code,Switching_Regime,Lamp_Type2,Easting,Northing'],
'outSR': ['27700'],
'returnGeometry': ['true'],
'spatialRel': ['esriSpatialRelIntersects']
}
# url without parameters
url = 'https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query'
response = requests.get(url, params=params)
#print(response.url)
#print(response.status_code)
data = response.json()
for item in data['features']:
print('Locality:', item['attributes']['Locality'].strip())
print('Town :', item['attributes']['Town'].strip())
print('Street :', item['attributes']['Assigned_Street'].strip())
print('Geometry:', item['geometry'])
print('---')
结果:
Locality: BRIGHTWALTON
Town : NEWBURY
Street : SAXONS ACRE
Geometry: {'x': 442763, 'y': 179193}
---
Locality: BRIGHTWALTON
Town : NEWBURY
Street : ASH CLOSE
Geometry: {'x': 442782, 'y': 179248}
---
Locality: BRIGHTWALTON
Town : NEWBURY
Street : SAXONS ACRE
Geometry: {'x': 442770, 'y': 179214}
---
编辑:
版本在 R
> install.packages("jsonlite")
> library(jsonlite)
> URL = 'https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query?f=json&returnGeometry=true&spatialRel=esriSpatialRelIntersects&geometry=%7B%22xmin%22%3A442520.89976831735%2C%22ymin%22%3A178788.66371417744%2C%22xmax%22%3A443314.65135582053%2C%22ymax%22%3A179582.41530168062%2C%22spatialReference%22%3A%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D%7D&geometryType=esriGeometryEnvelope&inSR=27700&outFields=OBJECTID%2CItem_Type%2CItem_Identity_Code%2CLocation_Description%2CAssigned_Street%2CLocality%2CTown%2CType%2CBracket_Type%2CLantern_Type%2CLamp_Type%2CBallast_Type%2CControl_Type%2CSign_Lantern_Type%2CSign_Bracket_Type%2CSign_Post_Type%2CBollard_Base_Type%2CBollard_Shell_Type%2CColumn_Manufacturer%2CMaterial_Type%2CLamp_Wattage%2CLantern_Manufacturer%2CNumber_of_Lamps%2CSwitching_Regime_Code%2CSwitching_Regime%2CLamp_Type2%2CEasting%2CNorthing&outSR=27700'
> data <- fromJSON(URL)
> library(magrittr) # to use `%>%
> data <- URL %>% fromJSON
> data$features$attributes$Locality
[1] "BRIGHTWALTON "
[2] "BRIGHTWALTON "
[3] "BRIGHTWALTON "
> data$features$attributes$Town
[1] "NEWBURY "
[2] "NEWBURY "
[3] "NEWBURY "
> data$features$attributes$Assigned_Street
[1] "SAXONS ACRE "
[2] "ASH CLOSE "
[3] "SAXONS ACRE "
> data$features$geometry
x y
1 442763 179193
2 442782 179248
3 442770 179214
> library(stringr)
> data$features$attributes$Locality %>% str_trim
[1] "BRIGHTWALTON" "BRIGHTWALTON" "BRIGHTWALTON"
编辑:
类似于Python
版本
> library(magrittr) # to use `%>%
> library(httr)
> library(jsonlite)
> URL = 'https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query'
> query = list(
f = list('json'),
geometry = list(
'{"xmin":442520.89976831735,"ymin":178788.66371417744,"xmax":443314.65135582053,"ymax":179582.41530168062,"spatialReference":{"wkid":27700,"latestWkid":27700}}'
),
geometryType = list('esriGeometryEnvelope'),
inSR = list('27700'),
outFields = list('OBJECTID,Item_Type,Item_Identity_Code,Location_Description,Assigned_Street,Locality,Town,Type,Bracket_Type,Lantern_Type,Lamp_Type,Ballast_Type,Control_Type,Sign_Lantern_Type,Sign_Bracket_Type,Sign_Post_Type,Bollard_Base_Type,Bollard_Shell_Type,Column_Manufacturer,Material_Type,Lamp_Wattage,Lantern_Manufacturer,Number_of_Lamps,Switching_Regime_Code,Switching_Regime,Lamp_Type2,Easting,Northing'),
outSR = list('27700'),
returnGeometry = list('true'),
spatialRel = list('esriSpatialRelIntersects')
)
> response <- GET(URL, query=query)
> data <- response %>% content %>% fromJSON
> data <- GET(URL, query=query) %>% content %>% fromJSON
> items <- data$features
我正在尝试对地图进行网络抓取,以便下载 Street Lighting 图层中的所有位置。我使用 RSelenium
来获取数据:
library(tidyverse)
library(rvest)
library(RSelenium)
# Open a browser
rD <- rsDriver(browser="firefox", port=4545L, verbose=F)
remDr <- rD[["client"]]
# Navigate to site
remDr$navigate("https://gis2.westberks.gov.uk/webapps/OnlineMap/")
此时,我通过浏览器打开 Street Lighting 图层(在 Highways 下),然后 select 在地图上打开一个 Street Light。如果我然后 运行:
h <- read_html(remDr$getPageSource()[[1]]) %>% html_nodes(".attrTable") %>% html_table()
我得到了那盏路灯的数据。但是,我想获取地图上显示的 all 街道照明的数据。我不知道该怎么做。是否可以在运行宁remDr$getPageSource()
之前以编程方式select地图上的所有灯光?
我看过这个 post,但并没有完全解决问题: Issue scraping website with reactive blocks
它将图层显示为单个图像,您无法获取位置。
它需要一些 Computer Vision
来检测图像上的圆圈。
当您点击地图时,它会将坐标发送到服务器
并发回 JSON data
并显示为弹出 window.
它发送这样的东西(有一些值xmin, xmax, ymin, ymax
)
(可以点击link查看JSON数据)
也许如果您将它与 xmin, xmax, ymin, ymax
一起用于更大的区域,那么您将获得所有值。
编辑:
我没有 R
方面的经验,但我可以在 Python
中展示示例。
它不需要 Selenium(在 Python
和 R
中)。
import requests
# full url with parameters
#url = 'https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query?f=json&returnGeometry=true&spatialRel=esriSpatialRelIntersects&geometry=%7B%22xmin%22%3A442520.89976831735%2C%22ymin%22%3A178788.66371417744%2C%22xmax%22%3A443314.65135582053%2C%22ymax%22%3A179582.41530168062%2C%22spatialReference%22%3A%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D%7D&geometryType=esriGeometryEnvelope&inSR=27700&outFields=OBJECTID%2CItem_Type%2CItem_Identity_Code%2CLocation_Description%2CAssigned_Street%2CLocality%2CTown%2CType%2CBracket_Type%2CLantern_Type%2CLamp_Type%2CBallast_Type%2CControl_Type%2CSign_Lantern_Type%2CSign_Bracket_Type%2CSign_Post_Type%2CBollard_Base_Type%2CBollard_Shell_Type%2CColumn_Manufacturer%2CMaterial_Type%2CLamp_Wattage%2CLantern_Manufacturer%2CNumber_of_Lamps%2CSwitching_Regime_Code%2CSwitching_Regime%2CLamp_Type2%2CEasting%2CNorthing&outSR=27700'
# only parameters
params = {
'f': ['json'],
'geometry': [
'{"xmin":442520.89976831735,"ymin":178788.66371417744,"xmax":443314.65135582053,"ymax":179582.41530168062,"spatialReference":{"wkid":27700,"latestWkid":27700}}'
],
'geometryType': ['esriGeometryEnvelope'],
'inSR': ['27700'],
'outFields': ['OBJECTID,Item_Type,Item_Identity_Code,Location_Description,Assigned_Street,Locality,Town,Type,Bracket_Type,Lantern_Type,Lamp_Type,Ballast_Type,Control_Type,Sign_Lantern_Type,Sign_Bracket_Type,Sign_Post_Type,Bollard_Base_Type,Bollard_Shell_Type,Column_Manufacturer,Material_Type,Lamp_Wattage,Lantern_Manufacturer,Number_of_Lamps,Switching_Regime_Code,Switching_Regime,Lamp_Type2,Easting,Northing'],
'outSR': ['27700'],
'returnGeometry': ['true'],
'spatialRel': ['esriSpatialRelIntersects']
}
# url without parameters
url = 'https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query'
response = requests.get(url, params=params)
#print(response.url)
#print(response.status_code)
data = response.json()
for item in data['features']:
print('Locality:', item['attributes']['Locality'].strip())
print('Town :', item['attributes']['Town'].strip())
print('Street :', item['attributes']['Assigned_Street'].strip())
print('Geometry:', item['geometry'])
print('---')
结果:
Locality: BRIGHTWALTON
Town : NEWBURY
Street : SAXONS ACRE
Geometry: {'x': 442763, 'y': 179193}
---
Locality: BRIGHTWALTON
Town : NEWBURY
Street : ASH CLOSE
Geometry: {'x': 442782, 'y': 179248}
---
Locality: BRIGHTWALTON
Town : NEWBURY
Street : SAXONS ACRE
Geometry: {'x': 442770, 'y': 179214}
---
编辑:
版本在 R
> install.packages("jsonlite")
> library(jsonlite)
> URL = 'https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query?f=json&returnGeometry=true&spatialRel=esriSpatialRelIntersects&geometry=%7B%22xmin%22%3A442520.89976831735%2C%22ymin%22%3A178788.66371417744%2C%22xmax%22%3A443314.65135582053%2C%22ymax%22%3A179582.41530168062%2C%22spatialReference%22%3A%7B%22wkid%22%3A27700%2C%22latestWkid%22%3A27700%7D%7D&geometryType=esriGeometryEnvelope&inSR=27700&outFields=OBJECTID%2CItem_Type%2CItem_Identity_Code%2CLocation_Description%2CAssigned_Street%2CLocality%2CTown%2CType%2CBracket_Type%2CLantern_Type%2CLamp_Type%2CBallast_Type%2CControl_Type%2CSign_Lantern_Type%2CSign_Bracket_Type%2CSign_Post_Type%2CBollard_Base_Type%2CBollard_Shell_Type%2CColumn_Manufacturer%2CMaterial_Type%2CLamp_Wattage%2CLantern_Manufacturer%2CNumber_of_Lamps%2CSwitching_Regime_Code%2CSwitching_Regime%2CLamp_Type2%2CEasting%2CNorthing&outSR=27700'
> data <- fromJSON(URL)
> library(magrittr) # to use `%>%
> data <- URL %>% fromJSON
> data$features$attributes$Locality
[1] "BRIGHTWALTON "
[2] "BRIGHTWALTON "
[3] "BRIGHTWALTON "
> data$features$attributes$Town
[1] "NEWBURY "
[2] "NEWBURY "
[3] "NEWBURY "
> data$features$attributes$Assigned_Street
[1] "SAXONS ACRE "
[2] "ASH CLOSE "
[3] "SAXONS ACRE "
> data$features$geometry
x y
1 442763 179193
2 442782 179248
3 442770 179214
> library(stringr)
> data$features$attributes$Locality %>% str_trim
[1] "BRIGHTWALTON" "BRIGHTWALTON" "BRIGHTWALTON"
编辑:
类似于Python
版本
> library(magrittr) # to use `%>%
> library(httr)
> library(jsonlite)
> URL = 'https://gis2.westberks.gov.uk/arcgis/rest/services/maps/Wbc_Highways/MapServer/11/query'
> query = list(
f = list('json'),
geometry = list(
'{"xmin":442520.89976831735,"ymin":178788.66371417744,"xmax":443314.65135582053,"ymax":179582.41530168062,"spatialReference":{"wkid":27700,"latestWkid":27700}}'
),
geometryType = list('esriGeometryEnvelope'),
inSR = list('27700'),
outFields = list('OBJECTID,Item_Type,Item_Identity_Code,Location_Description,Assigned_Street,Locality,Town,Type,Bracket_Type,Lantern_Type,Lamp_Type,Ballast_Type,Control_Type,Sign_Lantern_Type,Sign_Bracket_Type,Sign_Post_Type,Bollard_Base_Type,Bollard_Shell_Type,Column_Manufacturer,Material_Type,Lamp_Wattage,Lantern_Manufacturer,Number_of_Lamps,Switching_Regime_Code,Switching_Regime,Lamp_Type2,Easting,Northing'),
outSR = list('27700'),
returnGeometry = list('true'),
spatialRel = list('esriSpatialRelIntersects')
)
> response <- GET(URL, query=query)
> data <- response %>% content %>% fromJSON
> data <- GET(URL, query=query) %>% content %>% fromJSON
> items <- data$features