使用 chrome headless 和 selenium 下载
Downloading with chrome headless and selenium
我正在使用 python-selenium 和 Chrome 59 并尝试自动执行一个简单的下载序列。当我正常启动浏览器时,可以下载,但是当我在 headless 模式下启动时,下载不工作。
# Headless implementation
from selenium import webdriver
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument("headless")
driver = webdriver.Chrome(chrome_options=chromeOptions)
driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()
# ^^^ Download doesn't start
# Normal Mode
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()
# ^^^ Download works normally
我什至尝试添加默认路径:
prefs = {"download.default_directory" : "/Users/Chetan/Desktop/"}
chromeOptions.add_argument("headless")
chromeOptions.add_experimental_option("prefs",prefs)
添加默认路径在正常实现中有效,但同样的问题在无头版本中仍然存在。
如何让下载以无头模式开始?
这是 Chrome 的一项功能,可防止软件将文件下载到您的计算机。不过有一个解决方法。 Read more about it here.
你需要做的是通过 DevTools 启用它,类似这样的东西:
async function setDownload () {
const client = await CDP({tab: 'ws://localhost:9222/devtools/browser'});
const info = await client.send('Browser.setDownloadBehavior', {behavior : "allow", downloadPath: "/tmp/"});
await client.close();
}
这是某人在提到的主题中给出的解决方案。 Here is his comment.
也许您为浏览器处理 returns 不同 HTML 页面的网站,意味着您想要的 XPath 或 Id 在无头浏览器中可能有所不同。
尝试在无头浏览器中下载 pageSource 并将其作为 HTML 页面打开以查看所需的 Id 或 XPath。
您可以将其视为 c# 示例 。
是的,为了安全起见,"feature"。如前所述,这里是错误讨论:https://bugs.chromium.org/p/chromium/issues/detail?id=696481
已在 chrome 版本 62.0.3196.0 或更高版本中添加支持以启用下载。
这是一个 python 实现。我不得不将该命令添加到 chrome 驱动程序命令中。我将尝试提交 PR,以便将来将其包含在库中。
def enable_download_in_headless_chrome(self, driver, download_dir):
# add missing support for chrome "send_command" to selenium webdriver
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': download_dir}}
command_result = driver.execute("send_command", params)
作为参考,这里有一个小仓库来演示如何使用它:
https://github.com/shawnbutton/PythonHeadlessChrome
更新 2020-05-01 有评论说这不再有效了。鉴于这个补丁已经存在一年多了,他们很可能已经更改了底层库。
JavaScript 使用 selenium-cucumber-js / selenium-webdriver 的完整工作示例:
const chromedriver = require('chromedriver');
const selenium = require('selenium-webdriver');
const command = require('selenium-webdriver/lib/command');
const chrome = require('selenium-webdriver/chrome');
module.exports = function() {
const chromeOptions = new chrome.Options()
.addArguments('--no-sandbox', '--headless', '--start-maximized', '--ignore-certificate-errors')
.setUserPreferences({
'profile.default_content_settings.popups': 0, // disable download file dialog
'download.default_directory': '/tmp/downloads', // default file download location
"download.prompt_for_download": false,
'download.directory_upgrade': true,
'safebrowsing.enabled': false,
'plugins.always_open_pdf_externally': true,
'plugins.plugins_disabled': ["Chrome PDF Viewer"]
})
.windowSize({width: 1600, height: 1200});
const driver = new selenium.Builder()
.withCapabilities({
browserName: 'chrome',
javascriptEnabled: true,
acceptSslCerts: true,
path: chromedriver.path
})
.setChromeOptions(chromeOptions)
.build();
driver.manage().window().maximize();
driver.getSession()
.then(session => {
const cmd = new command.Command("SEND_COMMAND")
.setParameter("cmd", "Page.setDownloadBehavior")
.setParameter("params", {'behavior': 'allow', 'downloadPath': '/tmp/downloads'});
driver.getExecutor().defineCommand("SEND_COMMAND", "POST", `/session/${session.getId()}/chromium/send_command`);
return driver.execute(cmd);
});
return driver;
};
关键部分是:
driver.getSession()
.then(session => {
const cmd = new command.Command("SEND_COMMAND")
.setParameter("cmd", "Page.setDownloadBehavior")
.setParameter("params", {'behavior': 'allow', 'downloadPath': '/tmp/downloads'});
driver.getExecutor().defineCommand("SEND_COMMAND", "POST", `/session/${session.getId()}/chromium/send_command`);
return driver.execute(cmd);
});
测试:
- Chrome67.0.3396.99
- Chrome驱动程序 2.36.540469
- 硒-黄瓜-js 1.5.12
- 硒网络驱动程序 3.0.0
这是一个基于 的 Python 的工作示例。我用 Chromium 68.0.3440.75 & chromedriver 2.38
测试了这个
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_experimental_option("prefs", {
"download.default_directory": "/path/to/download/dir",
"download.prompt_for_download": False,
})
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': "/path/to/download/dir"}}
command_result = driver.execute("send_command", params)
driver.get('http://download-page.url/')
driver.find_element_by_css_selector("#download_link").click()
以下是 Java、selenium、chromedriver 和 chrome v 71.x 中的等价物。中的代码是允许保存下载的关键
其他罐子:com.fasterxml.jackson.core、com.fasterxml.jackson.annotation、com.fasterxml.jackson.databind
System.setProperty("webdriver.chrome.driver","C:\libraries\chromedriver.exe");
String downloadFilepath = "C:\Download";
HashMap<String, Object> chromePreferences = new HashMap<String, Object>();
chromePreferences.put("profile.default_content_settings.popups", 0);
chromePreferences.put("download.prompt_for_download", "false");
chromePreferences.put("download.default_directory", downloadFilepath);
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.setBinary("C:\pathto\Chrome SxS\Application\chrome.exe");
//ChromeOptions options = new ChromeOptions();
//chromeOptions.setExperimentalOption("prefs", chromePreferences);
chromeOptions.addArguments("start-maximized");
chromeOptions.addArguments("disable-infobars");
//HEADLESS CHROME
**chromeOptions.addArguments("headless");**
chromeOptions.setExperimentalOption("prefs", chromePreferences);
DesiredCapabilities cap = DesiredCapabilities.chrome();
cap.setCapability(CapabilityType.ACCEPT_SSL_CERTS, true);
cap.setCapability(ChromeOptions.CAPABILITY, chromeOptions);
**ChromeDriverService driverService = ChromeDriverService.createDefaultService();
ChromeDriver driver = new ChromeDriver(driverService, chromeOptions);
Map<String, Object> commandParams = new HashMap<>();
commandParams.put("cmd", "Page.setDownloadBehavior");
Map<String, String> params = new HashMap<>();
params.put("behavior", "allow");
params.put("downloadPath", downloadFilepath);
commandParams.put("params", params);
ObjectMapper objectMapper = new ObjectMapper();
HttpClient httpClient = HttpClientBuilder.create().build();
String command = objectMapper.writeValueAsString(commandParams);
String u = driverService.getUrl().toString() + "/session/" + driver.getSessionId() + "/chromium/send_command";
HttpPost request = new HttpPost(u);
request.addHeader("content-type", "application/json");
request.setEntity(new StringEntity(command));**
try {
httpClient.execute(request);
} catch (IOException e2) {
// TODO Auto-generated catch block
e2.printStackTrace();
}**
//Continue using the driver for automation
driver.manage().window().maximize();
通常看到同样的东西只是用另一种语言编写是多余的,但因为这个问题让我发疯,我希望我能把别人从痛苦中解救出来……所以这是 [=11= 的 C# 版本](使用无头测试 chrome=71.0.3578.98,chromedriver=2.45.615279,platform=Linux 4.9.125-linuxkit x86_64)):
var enableDownloadCommandParameters = new Dictionary<string, object>
{
{ "behavior", "allow" },
{ "downloadPath", downloadDirectoryPath }
};
var result = ((OpenQA.Selenium.Chrome.ChromeDriver)driver).ExecuteChromeCommandWithResult("Page.setDownloadBehavior", enableDownloadCommandParameters);
我通过使用@Shawn Button 共享的解决方法并使用完整路径 'downloadPath' 参数解决了这个问题。使用 相对路径 无效并给我错误。
版本:
Chrome 版本 75.0.3770.100(正式版)(32 位)
Chrome驱动程序 75.0.3770.90
使用:google-chrome-stable amd64 86.0.4240.111-1
、chromedriver 86.0.4240.22
、selenium 3.141.0
python 3.8.3
尝试了多个建议的解决方案,chrome headless 没有任何效果,而且我的测试网站打开了一个新的空白选项卡,然后下载了数据。
最终放弃了 headless 并实现了 pyvirtualdisplay 和 xvfd
来模拟 X 服务器,比如:
from selenium.webdriver.chrome.options import Options # and other imports
import selenium.webdriver as webdriver
import tempfile
url = "https://really_badly_programmed_website.org"
tmp_dir = tempfile.mkdtemp(prefix="hamster_")
driver_path="/usr/bin/chromedriver"
chrome_options = Options()
chrome_options.binary_location = "/usr/bin/google-chrome"
prefs = {'download.default_directory': tmp_dir,}
chrome_options.add_experimental_option("prefs", prefs)
with Display(backend="xvfb",size=(1920,1080),color_depth=24) as disp:
driver = webdriver.Chrome(options=chrome_options, executable_path=driver_path)
driver.get(url)
最后一切正常,tmp 文件夹中有下载文件。
已更新 PYTHON 解决方案 -
2021 年 3 月 4 日在 chromedriver v88 和 v89 上测试
这将允许您在无头模式下点击下载文件。
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
# Instantiate headless driver
chrome_options = Options()
# Windows path
chromedriver_location = 'C:\path\to\chromedriver_win32\chromedriver.exe'
# Mac path. May have to allow chromedriver developer in os system prefs
'/Users/path/to/chromedriver'
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_prefs = {"download.default_directory": r"C:\path\to\Downloads"} # (windows)
chrome_options.experimental_options["prefs"] = chrome_prefs
driver = webdriver.Chrome(chromedriver_location,options=chrome_options)
# Download your file
driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()
我终于通过升级到 Chromium 90 让它工作了!我以前有 72-78 版,但我看到它最近已修复:https://bugs.chromium.org/p/chromium/issues/detail?id=696481 所以我决定试一试。
所以升级后,花了一段时间(MacOS 中的自制软件太慢了......),我只是做了,没有设置选项或任何东西(这是一个 JavaScript 示例):
await driver.findElement(By.className('download')).click();
而且成功了!我在我尝试下载了很长时间的同一个工作文件夹中看到了下载的 PDF...
我正在使用 python-selenium 和 Chrome 59 并尝试自动执行一个简单的下载序列。当我正常启动浏览器时,可以下载,但是当我在 headless 模式下启动时,下载不工作。
# Headless implementation
from selenium import webdriver
chromeOptions = webdriver.ChromeOptions()
chromeOptions.add_argument("headless")
driver = webdriver.Chrome(chrome_options=chromeOptions)
driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()
# ^^^ Download doesn't start
# Normal Mode
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()
# ^^^ Download works normally
我什至尝试添加默认路径:
prefs = {"download.default_directory" : "/Users/Chetan/Desktop/"}
chromeOptions.add_argument("headless")
chromeOptions.add_experimental_option("prefs",prefs)
添加默认路径在正常实现中有效,但同样的问题在无头版本中仍然存在。
如何让下载以无头模式开始?
这是 Chrome 的一项功能,可防止软件将文件下载到您的计算机。不过有一个解决方法。 Read more about it here.
你需要做的是通过 DevTools 启用它,类似这样的东西:
async function setDownload () {
const client = await CDP({tab: 'ws://localhost:9222/devtools/browser'});
const info = await client.send('Browser.setDownloadBehavior', {behavior : "allow", downloadPath: "/tmp/"});
await client.close();
}
这是某人在提到的主题中给出的解决方案。 Here is his comment.
也许您为浏览器处理 returns 不同 HTML 页面的网站,意味着您想要的 XPath 或 Id 在无头浏览器中可能有所不同。
尝试在无头浏览器中下载 pageSource 并将其作为 HTML 页面打开以查看所需的 Id 或 XPath。
您可以将其视为 c# 示例
是的,为了安全起见,"feature"。如前所述,这里是错误讨论:https://bugs.chromium.org/p/chromium/issues/detail?id=696481
已在 chrome 版本 62.0.3196.0 或更高版本中添加支持以启用下载。
这是一个 python 实现。我不得不将该命令添加到 chrome 驱动程序命令中。我将尝试提交 PR,以便将来将其包含在库中。
def enable_download_in_headless_chrome(self, driver, download_dir):
# add missing support for chrome "send_command" to selenium webdriver
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': download_dir}}
command_result = driver.execute("send_command", params)
作为参考,这里有一个小仓库来演示如何使用它: https://github.com/shawnbutton/PythonHeadlessChrome
更新 2020-05-01 有评论说这不再有效了。鉴于这个补丁已经存在一年多了,他们很可能已经更改了底层库。
JavaScript 使用 selenium-cucumber-js / selenium-webdriver 的完整工作示例:
const chromedriver = require('chromedriver');
const selenium = require('selenium-webdriver');
const command = require('selenium-webdriver/lib/command');
const chrome = require('selenium-webdriver/chrome');
module.exports = function() {
const chromeOptions = new chrome.Options()
.addArguments('--no-sandbox', '--headless', '--start-maximized', '--ignore-certificate-errors')
.setUserPreferences({
'profile.default_content_settings.popups': 0, // disable download file dialog
'download.default_directory': '/tmp/downloads', // default file download location
"download.prompt_for_download": false,
'download.directory_upgrade': true,
'safebrowsing.enabled': false,
'plugins.always_open_pdf_externally': true,
'plugins.plugins_disabled': ["Chrome PDF Viewer"]
})
.windowSize({width: 1600, height: 1200});
const driver = new selenium.Builder()
.withCapabilities({
browserName: 'chrome',
javascriptEnabled: true,
acceptSslCerts: true,
path: chromedriver.path
})
.setChromeOptions(chromeOptions)
.build();
driver.manage().window().maximize();
driver.getSession()
.then(session => {
const cmd = new command.Command("SEND_COMMAND")
.setParameter("cmd", "Page.setDownloadBehavior")
.setParameter("params", {'behavior': 'allow', 'downloadPath': '/tmp/downloads'});
driver.getExecutor().defineCommand("SEND_COMMAND", "POST", `/session/${session.getId()}/chromium/send_command`);
return driver.execute(cmd);
});
return driver;
};
关键部分是:
driver.getSession()
.then(session => {
const cmd = new command.Command("SEND_COMMAND")
.setParameter("cmd", "Page.setDownloadBehavior")
.setParameter("params", {'behavior': 'allow', 'downloadPath': '/tmp/downloads'});
driver.getExecutor().defineCommand("SEND_COMMAND", "POST", `/session/${session.getId()}/chromium/send_command`);
return driver.execute(cmd);
});
测试:
- Chrome67.0.3396.99
- Chrome驱动程序 2.36.540469
- 硒-黄瓜-js 1.5.12
- 硒网络驱动程序 3.0.0
这是一个基于
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
chrome_options = Options()
chrome_options.add_experimental_option("prefs", {
"download.default_directory": "/path/to/download/dir",
"download.prompt_for_download": False,
})
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(chrome_options=chrome_options)
driver.command_executor._commands["send_command"] = ("POST", '/session/$sessionId/chromium/send_command')
params = {'cmd': 'Page.setDownloadBehavior', 'params': {'behavior': 'allow', 'downloadPath': "/path/to/download/dir"}}
command_result = driver.execute("send_command", params)
driver.get('http://download-page.url/')
driver.find_element_by_css_selector("#download_link").click()
以下是 Java、selenium、chromedriver 和 chrome v 71.x 中的等价物。中的代码是允许保存下载的关键 其他罐子:com.fasterxml.jackson.core、com.fasterxml.jackson.annotation、com.fasterxml.jackson.databind
System.setProperty("webdriver.chrome.driver","C:\libraries\chromedriver.exe");
String downloadFilepath = "C:\Download";
HashMap<String, Object> chromePreferences = new HashMap<String, Object>();
chromePreferences.put("profile.default_content_settings.popups", 0);
chromePreferences.put("download.prompt_for_download", "false");
chromePreferences.put("download.default_directory", downloadFilepath);
ChromeOptions chromeOptions = new ChromeOptions();
chromeOptions.setBinary("C:\pathto\Chrome SxS\Application\chrome.exe");
//ChromeOptions options = new ChromeOptions();
//chromeOptions.setExperimentalOption("prefs", chromePreferences);
chromeOptions.addArguments("start-maximized");
chromeOptions.addArguments("disable-infobars");
//HEADLESS CHROME
**chromeOptions.addArguments("headless");**
chromeOptions.setExperimentalOption("prefs", chromePreferences);
DesiredCapabilities cap = DesiredCapabilities.chrome();
cap.setCapability(CapabilityType.ACCEPT_SSL_CERTS, true);
cap.setCapability(ChromeOptions.CAPABILITY, chromeOptions);
**ChromeDriverService driverService = ChromeDriverService.createDefaultService();
ChromeDriver driver = new ChromeDriver(driverService, chromeOptions);
Map<String, Object> commandParams = new HashMap<>();
commandParams.put("cmd", "Page.setDownloadBehavior");
Map<String, String> params = new HashMap<>();
params.put("behavior", "allow");
params.put("downloadPath", downloadFilepath);
commandParams.put("params", params);
ObjectMapper objectMapper = new ObjectMapper();
HttpClient httpClient = HttpClientBuilder.create().build();
String command = objectMapper.writeValueAsString(commandParams);
String u = driverService.getUrl().toString() + "/session/" + driver.getSessionId() + "/chromium/send_command";
HttpPost request = new HttpPost(u);
request.addHeader("content-type", "application/json");
request.setEntity(new StringEntity(command));**
try {
httpClient.execute(request);
} catch (IOException e2) {
// TODO Auto-generated catch block
e2.printStackTrace();
}**
//Continue using the driver for automation
driver.manage().window().maximize();
通常看到同样的东西只是用另一种语言编写是多余的,但因为这个问题让我发疯,我希望我能把别人从痛苦中解救出来……所以这是 [=11= 的 C# 版本](使用无头测试 chrome=71.0.3578.98,chromedriver=2.45.615279,platform=Linux 4.9.125-linuxkit x86_64)):
var enableDownloadCommandParameters = new Dictionary<string, object>
{
{ "behavior", "allow" },
{ "downloadPath", downloadDirectoryPath }
};
var result = ((OpenQA.Selenium.Chrome.ChromeDriver)driver).ExecuteChromeCommandWithResult("Page.setDownloadBehavior", enableDownloadCommandParameters);
我通过使用@Shawn Button 共享的解决方法并使用完整路径 'downloadPath' 参数解决了这个问题。使用 相对路径 无效并给我错误。
版本:
Chrome 版本 75.0.3770.100(正式版)(32 位)
Chrome驱动程序 75.0.3770.90
使用:google-chrome-stable amd64 86.0.4240.111-1
、chromedriver 86.0.4240.22
、selenium 3.141.0
python 3.8.3
尝试了多个建议的解决方案,chrome headless 没有任何效果,而且我的测试网站打开了一个新的空白选项卡,然后下载了数据。
最终放弃了 headless 并实现了 pyvirtualdisplay 和 xvfd
来模拟 X 服务器,比如:
from selenium.webdriver.chrome.options import Options # and other imports
import selenium.webdriver as webdriver
import tempfile
url = "https://really_badly_programmed_website.org"
tmp_dir = tempfile.mkdtemp(prefix="hamster_")
driver_path="/usr/bin/chromedriver"
chrome_options = Options()
chrome_options.binary_location = "/usr/bin/google-chrome"
prefs = {'download.default_directory': tmp_dir,}
chrome_options.add_experimental_option("prefs", prefs)
with Display(backend="xvfb",size=(1920,1080),color_depth=24) as disp:
driver = webdriver.Chrome(options=chrome_options, executable_path=driver_path)
driver.get(url)
最后一切正常,tmp 文件夹中有下载文件。
已更新 PYTHON 解决方案 - 2021 年 3 月 4 日在 chromedriver v88 和 v89 上测试
这将允许您在无头模式下点击下载文件。
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
# Instantiate headless driver
chrome_options = Options()
# Windows path
chromedriver_location = 'C:\path\to\chromedriver_win32\chromedriver.exe'
# Mac path. May have to allow chromedriver developer in os system prefs
'/Users/path/to/chromedriver'
chrome_options.add_argument("--headless")
chrome_options.add_argument("--no-sandbox")
chrome_options.add_argument("--disable-dev-shm-usage")
chrome_prefs = {"download.default_directory": r"C:\path\to\Downloads"} # (windows)
chrome_options.experimental_options["prefs"] = chrome_prefs
driver = webdriver.Chrome(chromedriver_location,options=chrome_options)
# Download your file
driver.get('https://www.mockaroo.com/')
driver.find_element_by_id('download').click()
我终于通过升级到 Chromium 90 让它工作了!我以前有 72-78 版,但我看到它最近已修复:https://bugs.chromium.org/p/chromium/issues/detail?id=696481 所以我决定试一试。
所以升级后,花了一段时间(MacOS 中的自制软件太慢了......),我只是做了,没有设置选项或任何东西(这是一个 JavaScript 示例):
await driver.findElement(By.className('download')).click();
而且成功了!我在我尝试下载了很长时间的同一个工作文件夹中看到了下载的 PDF...