使用 PhantomJS + Selenium 处理重定向
Handling Redirection w/ PhantomJS + Selenium
我目前 运行 在 Python 中通过 PhantomJS + Selenium 进行浏览器测试。
desired_capabilities = dict(DesiredCapabilities.PHANTOMJS)
desired_capabilities["phantomjs.page.settings.userAgent"] = ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36")
driver = webdriver.PhantomJS(executable_path="./phantomjs", desired_capabilities=desired_capabilities)
driver.get('http://google.com')
这工作正常,除非我尝试 get
的页面上有重定向。
示例:
https://login.vrealizeair.vmware.com/
在这种情况下,get
无法正常工作。页面源为空:<html><head></head></body></html>
.
这是一个 known issue 发布的解决方案,涉及添加一段代码以适当地处理重定向。
How/where 如果您 运行 使用 Selenium 进行测试(在我的第一个代码片段中),您是否添加此代码?它是 desired_capabilties
的一部分吗?
示例:
page.onNavigationRequested = function(url, type, willNavigate, main) {
if (main && url!=myurl) {
myurl = url;
console.log("redirect caught")
page.close()
renderPage(url);
}
};
page.open(url, function(status) {
if (status==="success") {
console.log(myurl);
console.log("success")
page.render('yourscreenshot.png');
phantom.exit(0);
} else {
console.log("failed")
phantom.exit(1);
}
});
我用 PhantomJS 1.9.8 和 2.0.1-development 试过了。
我使用了以下设置:
DesiredCapabilities capabilities;
capabilities = new DesiredCapabilities();
capabilities.setJavascriptEnabled(true);
capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, "drivers/phantomjs.exe");
capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_PAGE_SETTINGS_PREFIX,"Y");
capabilities.setCapability("phantomjs.page.settings.userAgent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:16.0) Gecko/20121026 Firefox/16.0");
//intialize driver and set capabilties
driver = new PhantomJSDriver(capabilities);
然后,我执行了以下两行,它们对我来说工作得很好
driver.get("https://login.vrealizeair.vmware.com/");
System.out.println(driver.getCurrentUrl());
System.out.println(driver.getPageSource());
这是输出:
https://login.vrealizeair.vmware.com/sso/UI/Login
<!-- [RESPONSE_PAGE_TYPE=3DLOGIN] --><!DOCTYPE html><html><head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Login | vRealize™ Air™</title>
<link rel="stylesheet" href="/sso/css/styles.css?v=3" type="text/css">
<link rel="shortcut icon" href="/sso/images/vmwareFavicon.ico" type="image/x-icon">
<script async="" src="//rum-static.pingdom.net/prum.min.js"></script><script>...........................................
.....................................................
...................................................//Entire page source was displayed
我在 python 中尝试了以下代码,它似乎工作正常:
from selenium import webdriver
driver = webdriver.PhantomJS("./phantomjs")
driver.get("https://login.vrealizeair.vmware.com/")
print 'done'
print driver.current_url
print driver.page_source
输出(工作正常):
https://login.vrealizeair.vmware.com/sso/UI/Login
<!-- [RESPONSE_PAGE_TYPE=3DLOGIN] --><!DOCTYPE html><html><head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Login | vRealize™ Air™</title>
<link rel="stylesheet" href="/sso/css/styles.css?v=3" type="text/css">
Imp note: 从基本页面开始导航。 html 代码为空,因为网站可能抛出 403 错误。如果登录 URL 不适合您,请尝试从出现在登录页面之前的页面导航。
原来是由于错误无法抓取页面:SSL handshake failed
。
解决方案是使用以下行来初始化驱动程序:
driver = webdriver.PhantomJS(executable_path="./phantomjs", service_args=['--ignore-ssl-errors=true'])
这个解决方案对我来说真的很有效,我在 phantomjsdriver.log 中遇到了以下错误,并且在尝试登录时,phantomjs 正在注销。
[DEBUG - 2017-08-19T20:37:59.288Z] Session [47739640-851e-11e7-9326-9bef0ad085f5] - page.onResourceError - {"errorCode":299,"errorString":"Error transferring https://int-test-cc.gcsip.nl:4443/rest/user/keepAlive?cacheBuster=1503175078533 - server replied: Unsupported Media Type","id":9,"status":415,"statusText":"Unsupported Media Type","url":"IPAdd:port/rest/user/keepAlive?cacheBuster=1503175078533"}
在将以下功能添加到 phantomjs 后它起作用了 -
caps.setJavascriptEnabled(true)
caps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, "phantomjs")
caps.setCapability(PhantomJSDriverService.PHANTOMJS_PAGE_SETTINGS_PREFIX,"Y");
caps.setCapability("phantomjs.page.settings.userAgent","Mozilla/5.0 (X11; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0")//"Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/602.1 (KHTML, like Gecko) PhantomJS/2.5.0-development Version/9.0 Safari/602.1")
caps.setCapability(PhantomJSDriverService.PHANTOMJS_PAGE_CUSTOMHEADERS_PREFIX + "Content-Type","application/json;charset=utf-8")
caps.setCapability(PhantomJSDriverService.PHANTOMJS_PAGE_CUSTOMHEADERS_PREFIX + "Connection","Keep-Alive")
我目前 运行 在 Python 中通过 PhantomJS + Selenium 进行浏览器测试。
desired_capabilities = dict(DesiredCapabilities.PHANTOMJS)
desired_capabilities["phantomjs.page.settings.userAgent"] = ("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/40.0.2214.115 Safari/537.36")
driver = webdriver.PhantomJS(executable_path="./phantomjs", desired_capabilities=desired_capabilities)
driver.get('http://google.com')
这工作正常,除非我尝试 get
的页面上有重定向。
示例:
https://login.vrealizeair.vmware.com/
在这种情况下,get
无法正常工作。页面源为空:<html><head></head></body></html>
.
这是一个 known issue 发布的解决方案,涉及添加一段代码以适当地处理重定向。
How/where 如果您 运行 使用 Selenium 进行测试(在我的第一个代码片段中),您是否添加此代码?它是 desired_capabilties
的一部分吗?
示例:
page.onNavigationRequested = function(url, type, willNavigate, main) {
if (main && url!=myurl) {
myurl = url;
console.log("redirect caught")
page.close()
renderPage(url);
}
};
page.open(url, function(status) {
if (status==="success") {
console.log(myurl);
console.log("success")
page.render('yourscreenshot.png');
phantom.exit(0);
} else {
console.log("failed")
phantom.exit(1);
}
});
我用 PhantomJS 1.9.8 和 2.0.1-development 试过了。
我使用了以下设置:
DesiredCapabilities capabilities;
capabilities = new DesiredCapabilities();
capabilities.setJavascriptEnabled(true);
capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, "drivers/phantomjs.exe");
capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_PAGE_SETTINGS_PREFIX,"Y");
capabilities.setCapability("phantomjs.page.settings.userAgent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:16.0) Gecko/20121026 Firefox/16.0");
//intialize driver and set capabilties
driver = new PhantomJSDriver(capabilities);
然后,我执行了以下两行,它们对我来说工作得很好
driver.get("https://login.vrealizeair.vmware.com/");
System.out.println(driver.getCurrentUrl());
System.out.println(driver.getPageSource());
这是输出:
https://login.vrealizeair.vmware.com/sso/UI/Login
<!-- [RESPONSE_PAGE_TYPE=3DLOGIN] --><!DOCTYPE html><html><head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Login | vRealize™ Air™</title>
<link rel="stylesheet" href="/sso/css/styles.css?v=3" type="text/css">
<link rel="shortcut icon" href="/sso/images/vmwareFavicon.ico" type="image/x-icon">
<script async="" src="//rum-static.pingdom.net/prum.min.js"></script><script>...........................................
.....................................................
...................................................//Entire page source was displayed
我在 python 中尝试了以下代码,它似乎工作正常:
from selenium import webdriver
driver = webdriver.PhantomJS("./phantomjs")
driver.get("https://login.vrealizeair.vmware.com/")
print 'done'
print driver.current_url
print driver.page_source
输出(工作正常):
https://login.vrealizeair.vmware.com/sso/UI/Login
<!-- [RESPONSE_PAGE_TYPE=3DLOGIN] --><!DOCTYPE html><html><head>
<meta charset="utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<title>Login | vRealize™ Air™</title>
<link rel="stylesheet" href="/sso/css/styles.css?v=3" type="text/css">
Imp note: 从基本页面开始导航。 html 代码为空,因为网站可能抛出 403 错误。如果登录 URL 不适合您,请尝试从出现在登录页面之前的页面导航。
原来是由于错误无法抓取页面:SSL handshake failed
。
解决方案是使用以下行来初始化驱动程序:
driver = webdriver.PhantomJS(executable_path="./phantomjs", service_args=['--ignore-ssl-errors=true'])
这个解决方案对我来说真的很有效,我在 phantomjsdriver.log 中遇到了以下错误,并且在尝试登录时,phantomjs 正在注销。
[DEBUG - 2017-08-19T20:37:59.288Z] Session [47739640-851e-11e7-9326-9bef0ad085f5] - page.onResourceError - {"errorCode":299,"errorString":"Error transferring https://int-test-cc.gcsip.nl:4443/rest/user/keepAlive?cacheBuster=1503175078533 - server replied: Unsupported Media Type","id":9,"status":415,"statusText":"Unsupported Media Type","url":"IPAdd:port/rest/user/keepAlive?cacheBuster=1503175078533"}
在将以下功能添加到 phantomjs 后它起作用了 -
caps.setJavascriptEnabled(true)
caps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, "phantomjs")
caps.setCapability(PhantomJSDriverService.PHANTOMJS_PAGE_SETTINGS_PREFIX,"Y");
caps.setCapability("phantomjs.page.settings.userAgent","Mozilla/5.0 (X11; Linux x86_64; rv:46.0) Gecko/20100101 Firefox/46.0")//"Mozilla/5.0 (Unknown; Linux x86_64) AppleWebKit/602.1 (KHTML, like Gecko) PhantomJS/2.5.0-development Version/9.0 Safari/602.1")
caps.setCapability(PhantomJSDriverService.PHANTOMJS_PAGE_CUSTOMHEADERS_PREFIX + "Content-Type","application/json;charset=utf-8")
caps.setCapability(PhantomJSDriverService.PHANTOMJS_PAGE_CUSTOMHEADERS_PREFIX + "Connection","Keep-Alive")