使用 Python 请求获取嵌入在 <script> 标记中的图像 URL

Question

我正在尝试使用 Python requests to get the url of an image in this web。具体来说，我想将 URL 获取到以 PPI_Z_005...

开头的图像

现在，为了得到这个，我尝试用 Python 请求获得 html。

weburl="https://smn.conagua.gob.mx/tools/GUI/visor_radares_v2/radares/cabos/cabos_ppi.php"
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36'
           '(KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36'}
response = requests.get(weburl, verify=False, headers=headers)

问题是响应没有明确引用我要查找的文件名。我想问题是它以某种方式由 JavaScript 呈现，并插入到 <script> 标记中。事实上，当我使用浏览器的开发人员工具检查网络源代码时，它包含以下内容：

  <script>
    [...]
    imagen_eco(/* Radar */ 'cabos', /* Nombre imagen */ "PPI_Z_005_300_20220206141529.png", /* Producto */ 'ppi', /* Limites */ [[25.589004,-112.910417],[20.147021,-106.944245]]);
    [...]
  </script>

我想这个标签以某种方式负责在呈现的网页中插入图像...但是如何？

是否可以单独使用requests来解析这个网站并获取这个文件名？

注意：我知道这可以使用 selenium 来完成。我正在寻找一种不含硒的解决方案。

Answer 1

这是一个包含请求和正则表达式的解决方案，用于查找您要查找的数据。

import requests
import re


weburl = (
    "https://smn.conagua.gob.mx/tools/GUI/visor_radares_v2/radares/cabos/cabos_ppi.php"
)
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36"
    "(KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36"
}
response = requests.get(weburl, verify=False, headers=headers)

source = response.content.decode("utf-8")

imagen_eco = re.search("imagen_eco((.*?));", source)
if not imagen_eco:
    exit("Not found")

image_name = re.search(r"([\w-]+)\.png", imagen_eco.group(0))
if not image_name:
    exit("Not found")
print(image_name.group(0))
print(
    f"https://smn.conagua.gob.mx/tools/GUI/visor_radares_v2/ecos/cabos/ppi/{image_name.group(0)}"
)

使用 Python 请求获取嵌入在 <script> 标记中的图像 URL

Getting image URL embedded in <script> tag with Python requests

python

web-scraping

python-requests