无法使用请求从网页中的 ArcGIS iframe 获取坐标
Can't grab coordinates from ArcGIS iframe in a webpage using requests
我创建了一个脚本,使用 requests
模块从位于 webpage 的地图中获取坐标(在本例中为 -119.412 49.023
)。当我尝试使用下面的脚本时,我什么也没得到。我知道我可以使用 selenium
获得该部分,但我希望使用 requests
模块完成它。我查看了开发工具以找到有关如何获取它的任何线索,但没有运气。
This是坐标所在的位置。
import requests
from bs4 import BeautifulSoup
link = 'https://www.rdos.bc.ca/development-services/planning/current-applications-decisions/electoral-area-a/a2018207-zone/'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
res = s.get(link)
soup = BeautifulSoup(res.text,"lxml")
print(soup.select_one("[data-dojo-attach-point='coordinateInfo']"))
如何使用 requests
从该站点抓取坐标?
您可以使用 requests-html
,它会在第一次渲染时自动下载 Chromium。
https://pypi.org/project/requests-html/
虽然没有得到<iframe src="{}">
元素的内容,所以我们.search()
iframelink,.render()
那个页面分开,然后等待coordinateInfo
加载。
import asyncio
from bs4 import BeautifulSoup
import requests_html
link = 'https://www.rdos.bc.ca/development-services/planning/current-applications-decisions/electoral-area-a/a2018207-zone/'
async def get_content(page):
content = await page.content()
while 'coordinateInfo' not in content or 'loading...' in content:
await asyncio.sleep(1)
content = await page.content()
await page.close()
return content
with requests_html.HTMLSession() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
res = s.get(link)
iframe_link = res.html.search('iframe src="{}"')[0].replace('&', '&')
iframe_res = s.get(iframe_link)
iframe_res.html.render(keep_page=True)
content = s.loop.run_until_complete(get_content(iframe_res.html.page))
soup = BeautifulSoup(content, "lxml")
print(soup.select_one("[data-dojo-attach-point='coordinateInfo']"))
*co-ordinates值完全取决于JavaScript和requests
模块无法渲染JavaScript
** 要查看 co-ordinates 值,需要 scroll down by JavaScript ececution
*** co-ordinates 值低于 iframe
**** 所以要获得 co-ordinates 值,你需要像 selenium
这样的自动化
***** 我使用 selenium4 pip install selenium and webdriverManager
******不要用maximize_window_size(),如果用的话,会说移动鼠标看到co-ordinates,正常你可以看到co-ordinate 完成 selenium
后留下下行空间
脚本:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
option = webdriver.ChromeOptions()
# Chrome to stay open
option.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=option)
driver.get('https://www.rdos.bc.ca/development-services/planning/current-applications-decisions/electoral-area-a/a2018207-zone/')
wait = WebDriverWait(driver, 30)
# Execute Javascript to scroll down to see the coordinates
driver.execute_script("arguments[0].scrollIntoView();", wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@class="bb-textmedia__content"]'))))
#Switch to iframe
driver.get(wait.until(EC.visibility_of_element_located((By.XPATH, '(//iframe)[1]'))).get_attribute('src'))
coordinates = wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@class="coordinate-info jimu-float-leading jimu-align-leading"]'))).text.replace('Degrees','')
print(coordinates)
输出:
-119.554 49.229
我创建了一个脚本,使用 requests
模块从位于 webpage 的地图中获取坐标(在本例中为 -119.412 49.023
)。当我尝试使用下面的脚本时,我什么也没得到。我知道我可以使用 selenium
获得该部分,但我希望使用 requests
模块完成它。我查看了开发工具以找到有关如何获取它的任何线索,但没有运气。
This是坐标所在的位置。
import requests
from bs4 import BeautifulSoup
link = 'https://www.rdos.bc.ca/development-services/planning/current-applications-decisions/electoral-area-a/a2018207-zone/'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
res = s.get(link)
soup = BeautifulSoup(res.text,"lxml")
print(soup.select_one("[data-dojo-attach-point='coordinateInfo']"))
如何使用 requests
从该站点抓取坐标?
您可以使用 requests-html
,它会在第一次渲染时自动下载 Chromium。
https://pypi.org/project/requests-html/
虽然没有得到<iframe src="{}">
元素的内容,所以我们.search()
iframelink,.render()
那个页面分开,然后等待coordinateInfo
加载。
import asyncio
from bs4 import BeautifulSoup
import requests_html
link = 'https://www.rdos.bc.ca/development-services/planning/current-applications-decisions/electoral-area-a/a2018207-zone/'
async def get_content(page):
content = await page.content()
while 'coordinateInfo' not in content or 'loading...' in content:
await asyncio.sleep(1)
content = await page.content()
await page.close()
return content
with requests_html.HTMLSession() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/99.0.4844.51 Safari/537.36'
res = s.get(link)
iframe_link = res.html.search('iframe src="{}"')[0].replace('&', '&')
iframe_res = s.get(iframe_link)
iframe_res.html.render(keep_page=True)
content = s.loop.run_until_complete(get_content(iframe_res.html.page))
soup = BeautifulSoup(content, "lxml")
print(soup.select_one("[data-dojo-attach-point='coordinateInfo']"))
*co-ordinates值完全取决于JavaScript和requests
模块无法渲染JavaScript
** 要查看 co-ordinates 值,需要 scroll down by JavaScript ececution
*** co-ordinates 值低于 iframe
**** 所以要获得 co-ordinates 值,你需要像 selenium
这样的自动化***** 我使用 selenium4 pip install selenium and webdriverManager
******不要用maximize_window_size(),如果用的话,会说移动鼠标看到co-ordinates,正常你可以看到co-ordinate 完成 selenium
后留下下行空间脚本:
from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
option = webdriver.ChromeOptions()
# Chrome to stay open
option.add_experimental_option("detach", True)
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()),options=option)
driver.get('https://www.rdos.bc.ca/development-services/planning/current-applications-decisions/electoral-area-a/a2018207-zone/')
wait = WebDriverWait(driver, 30)
# Execute Javascript to scroll down to see the coordinates
driver.execute_script("arguments[0].scrollIntoView();", wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@class="bb-textmedia__content"]'))))
#Switch to iframe
driver.get(wait.until(EC.visibility_of_element_located((By.XPATH, '(//iframe)[1]'))).get_attribute('src'))
coordinates = wait.until(EC.visibility_of_element_located((By.XPATH, '//*[@class="coordinate-info jimu-float-leading jimu-align-leading"]'))).text.replace('Degrees','')
print(coordinates)
输出:
-119.554 49.229