如何在 python selenium 的特定自定义文件夹中使用 selenium edge web 驱动程序下载 pdf 文件?

How to download file in pdf with selenium edge web driver in specific custom folder in python selenium?

我正在使用 selenium webdriver 自动下载多个 PDF 文件。我得到了 PDF 预览 window(见下文),现在我想下载文件。我如何使用 edge 作为浏览器来完成此操作?

Sample Screenshot i want to download

到目前为止,这是我所做的,但它不起作用。

path = "F:\Anuzz\Desktop\sel\msedgedriver.exe"
options = EdgeOptions()
options.add_experimental_option('prefs', {
    "download.default_directory": "F:\Anuzz\Desktop\sel\test.py",
    "download.prompt_for_download": False,   
    "plugins.always_open_pdf_externally": True
})
driver = Edge(path, options=options)
driver.get('https://sscstudy.com/ssc-chsl-paper-pdf-download/')
driver.find_element_by_xpath('//*[@id="post-11490"]/div/div/p[4]/a/strong').click()

NEW(在边缘工作)

要使用它,您必须使用命令 pip install pyautogui

安装 pyautogui
import time
import pyautogui
from selenium import webdriver

driver = webdriver.Edge()

pdf_url = 'http://www.africau.edu/images/default/sample.pdf'
driver.get(pdf_url)

time.sleep(3)

pyautogui.hotkey('ctrl', 's')
time.sleep(2)
path_and_filename = r'C:\Users\gt\Desktop\test.pdf'
pyautogui.typewrite(path_and_filename)
pyautogui.press('enter')

OLD(适用于 chrome)

这是我用来自动将 pdf 下载到特定路径的代码。如果您有 windows,只需将您的帐户名输入 r'C:\Users\...\Desktop'。此外,您必须将驱动程序的路径放在 chromedriver_path 中。下面的代码下载示例 pdf。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.service import Service

options = webdriver.ChromeOptions()
download_path = r'C:\Users\...\Desktop'
options.add_experimental_option('prefs', {
"download.default_directory": download_path, # change default directory for downloads
"download.prompt_for_download": False, # to auto download the file
"download.directory_upgrade": True,
"plugins.always_open_pdf_externally": True # it will not show PDF directly in chrome
})

chromedriver_path = '...'
driver = webdriver.Chrome(options=options, service=Service(chromedriver_path))

pdf_url = 'http://www.africau.edu/images/default/sample.pdf'
driver.get(pdf_url)

经测试,我认为问题主要出在您提供的站点,该站点似乎嵌入了其他 PDF 查看器,而不是 Edge 自带的。

所以你可能需要这样的代码来实现你的需求(url拼接):

from selenium import webdriver
from selenium.webdriver.edge import service
import time

edgeOption = webdriver.EdgeOptions()
edgeOption.use_chromium = True
edgeOption.add_argument("start-maximized")
edgeOption.add_experimental_option('prefs', {
    "download.default_directory": "C:\Downloads",
    "download.prompt_for_download": False
})
edgeOption.binary_location = r"C:\Program Files (x86)\Microsoft\Edge\Application\msedge.exe"
s=service.Service(r'C:\Users\Administrator\Desktop\msedgedriver.exe')
driver = webdriver.Edge(service=s, options=edgeOption)

driver.get('https://sscstudy.com/ssc-chsl-paper-pdf-download/')
url = driver.find_element_by_xpath('//*[@id="post-11490"]/div/div/p[4]/a').get_attribute('href')

driver.get("https://drive.google.com/uc?id="+url[32:(len(url)-17)]+"&export=download")
time.sleep(1)

注意:用Selenium 4.1.0Edge 101.0.1210.53测试。请根据自身情况修改Edge Driver的路径和其他可能的参数。