在自动脚本中从网站检索 zip 文件

Question

供参考：http://wogcc.state.wy.us/urecordsMenu.cfm?Skip=%27Y%27&oops=ID14447

我正在尝试从没有专用 url 的文件中检索一个 zip 文件。我在 Python 机械化和漂亮的汤方面做得很好，但是运行在接近过程结束时遇到了问题。

在table（通过mechanize/bs4）中select输入我想要的选项后，我试图让我的浏览器进入"submit"表单并检索我的压缩文件。但是，"submit" 按钮只是带有

的 gif 图像

onclick="javascript:submit()"

打电话。当你在浏览器中手动点击那个按钮时，它会将你重定向到一个通用的“.....testdwn.cfm?RequestTimeout=2000”页面，无论你在点击 gif 之前 select 选择哪个选项图像（同时下载您的 zip 文件）。所以我的问题是没有专用的 zip url.

因此，根据我过去几天在网上阅读的内容，Python/Mechanize 无法以任何方式阅读 javascript，所以我似乎在这条路上是 SOL。如果 mechanize 能以某种方式点击那个按钮，一切都会好起来的。

我应该采用什么方法来拉取这些数据？我读过有关 selenium 的内容，但我想知道什么选项绝对最简单和最好地提取这些数据，基于 javascipt 或 python-selenium，还是其他？ Python如能管理则优先

提前致谢！

Answer 1

好的，我使用 Selenium 找到了答案，

import selenium
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.action_chains import ActionChains


driver = webdriver.Chrome(executable_path=r"C:\Users\xx\xx\xx\xx\xx\xx\chromedriver.exe")
driver.get("http://wogcc.state.wy.us/urecordsMenu.cfm?Skip=%27Y%27&oops=ID14447")
assert "Download Menu" in driver.title
form = driver.find_element_by_xpath("/html/body/table[2]/tbody/tr[7]/td/form/table[1]/tbody/tr[3]/td[2]/select/option[37]")
submit = driver.find_element_by_xpath("/html/body/table[2]/tbody/tr[7]/td/form/table[1]/tbody/tr[3]/td[1]/font/img")

ActionChains(driver).move_to_element(form).click(form).perform()
ActionChains(driver).move_to_element(submit).click(submit).perform()

我导航到该页面并使用 Selenium 的 find_element_by_path 和他们的 ActionChains select 然后点击我想要的一切

在自动脚本中从网站检索 zip 文件

Retrieve a zip file from a website in an automated script

javascript

python

selenium

mechanize

web-scraping