Selenium pdf自动下载不起作用
Selenium pdf automatic download not working
我是 selenium 的新手,我正在编写一个抓取工具来自动从给定站点下载 pdf 文件。
下面是我的代码:
from selenium import webdriver
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2);
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", "/home/jill/Downloads/Dinamalar")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")
browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://epaper.dinamalar.com/PUBLICATIONS/DM/MADHURAI/2015/05/26/PagePrint//26_05_2015_001_b2b69fda315301809dda359a6d3d9689.pdf");
webobj = browser.find_element_by_id("download").click();
我遵循了 Selenium documentation and in the this link 中提到的步骤。我不确定为什么每次都显示下载对话框。
有没有办法修复它,还有没有办法提供 "application/all" 以便可以下载所有文件(解决方法)?
由于没有 HTML 代码可用,我猜这行
webobj = browser.find_element_by_id("download").click();
其实调用了onclick
事件,只是你没有正确处理。换句话说,您缺少的是存储此 .pdf 文件的位置。我对 python 编程的经验很少,但一种解决方案是使用 HTTP webclient lib,这将允许您自动下载文件。类似于 CSharp's WebClient.DownloadFile Method (String, String)。如果使用得当,您可以跳过此操作的任何 Selenium 命令。
也许 this post 之类的东西会是一个好的开始。
禁用内置 pdfjs
插件并导航至 URL - PDF 文件将自动下载,代码:
from selenium import webdriver
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", "/home/jill/Downloads/Dinamalar")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf,application/x-pdf")
fp.set_preference("pdfjs.disabled", "true") # < KEY PART HERE
browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://epaper.dinamalar.com/PUBLICATIONS/DM/MADHURAI/2015/05/26/PagePrint//26_05_2015_001_b2b69fda315301809dda359a6d3d9689.pdf");
更新(对我有用的完整代码):
from selenium import webdriver
mime_types = "application/pdf,application/vnd.adobe.xfdf,application/vnd.fdf,application/vnd.adobe.xdp+xml"
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", "/home/aafanasiev/Downloads")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", mime_types)
fp.set_preference("plugin.disable_full_page_plugin_for_types", mime_types)
fp.set_preference("pdfjs.disabled", True)
browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://epaper.dinamalar.com/")
webobj_get_link = browser.find_element_by_id("liSavePdf")
webobj_get_object = webobj_get_link.find_element_by_tag_name("a")
webobj_get_object.click()
我测试了以下代码,并在 Windows 7:
上成功下载了您的 pdf
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", download_location)
fp.set_preference("plugin.disable_full_page_plugin_for_types", "application/pdf")
fp.set_preference("pdfjs.disabled", True)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")
driver = webdriver.Firefox(fp)
driver.implicitly_wait(10)
driver.maximize_window()
driver.get("http://epaper.dinamalar.com/")
element = driver.find_element_by_css_selector("li#liSavePdf>a>img")
element.click()
我是 selenium 的新手,我正在编写一个抓取工具来自动从给定站点下载 pdf 文件。
下面是我的代码:
from selenium import webdriver
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList",2);
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", "/home/jill/Downloads/Dinamalar")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")
browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://epaper.dinamalar.com/PUBLICATIONS/DM/MADHURAI/2015/05/26/PagePrint//26_05_2015_001_b2b69fda315301809dda359a6d3d9689.pdf");
webobj = browser.find_element_by_id("download").click();
我遵循了 Selenium documentation and in the this link 中提到的步骤。我不确定为什么每次都显示下载对话框。
有没有办法修复它,还有没有办法提供 "application/all" 以便可以下载所有文件(解决方法)?
由于没有 HTML 代码可用,我猜这行
webobj = browser.find_element_by_id("download").click();
其实调用了onclick
事件,只是你没有正确处理。换句话说,您缺少的是存储此 .pdf 文件的位置。我对 python 编程的经验很少,但一种解决方案是使用 HTTP webclient lib,这将允许您自动下载文件。类似于 CSharp's WebClient.DownloadFile Method (String, String)。如果使用得当,您可以跳过此操作的任何 Selenium 命令。
也许 this post 之类的东西会是一个好的开始。
禁用内置 pdfjs
插件并导航至 URL - PDF 文件将自动下载,代码:
from selenium import webdriver
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting",False)
fp.set_preference("browser.download.dir", "/home/jill/Downloads/Dinamalar")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf,application/x-pdf")
fp.set_preference("pdfjs.disabled", "true") # < KEY PART HERE
browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://epaper.dinamalar.com/PUBLICATIONS/DM/MADHURAI/2015/05/26/PagePrint//26_05_2015_001_b2b69fda315301809dda359a6d3d9689.pdf");
更新(对我有用的完整代码):
from selenium import webdriver
mime_types = "application/pdf,application/vnd.adobe.xfdf,application/vnd.fdf,application/vnd.adobe.xdp+xml"
fp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", "/home/aafanasiev/Downloads")
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", mime_types)
fp.set_preference("plugin.disable_full_page_plugin_for_types", mime_types)
fp.set_preference("pdfjs.disabled", True)
browser = webdriver.Firefox(firefox_profile=fp)
browser.get("http://epaper.dinamalar.com/")
webobj_get_link = browser.find_element_by_id("liSavePdf")
webobj_get_object = webobj_get_link.find_element_by_tag_name("a")
webobj_get_object.click()
我测试了以下代码,并在 Windows 7:
上成功下载了您的 pdffp = webdriver.FirefoxProfile()
fp.set_preference("browser.download.folderList", 2)
fp.set_preference("browser.download.manager.showWhenStarting", False)
fp.set_preference("browser.download.dir", download_location)
fp.set_preference("plugin.disable_full_page_plugin_for_types", "application/pdf")
fp.set_preference("pdfjs.disabled", True)
fp.set_preference("browser.helperApps.neverAsk.saveToDisk", "application/pdf")
driver = webdriver.Firefox(fp)
driver.implicitly_wait(10)
driver.maximize_window()
driver.get("http://epaper.dinamalar.com/")
element = driver.find_element_by_css_selector("li#liSavePdf>a>img")
element.click()