为什么我在 python 中使用 PhantomJS 同时拥有两个不同的用户代理?
How come I have two different user agents at the same time with PhantomJS in python?
下面的代码为 phantomJS 实例设置用户代理,打印它,然后抓取一个网站再次确定它。结果不同。怎么会这样?我还没有能够重现 this 明显的解决方案。
1) 设置 ONE 用户代理
serviceDefaults=["--ignore-ssl-errors=yes",]
desiredDefaults={
"phantomjs.page.settings.userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"}
2) 设置驱动和打印用户代理
def create_phantomJS():
driver = webdriver.PhantomJS("phantomjs.exe", desired_capabilities=desiredDefaults, service_args=serviceDefaults)
phantom_exc_uri='/session/$sessionId/phantom/execute'
driver.command_executor._commands['executePhantomScript'] = ('POST', phantom_exc_uri)
initScript="""
this.onInitialized=function() {
var page=this;
if (page.navigator == page.settings.userAgent){return};
page.settings.navigator = page.settings.userAgent;
}
"""
driver.execute('executePhantomScript',{'script': initScript, 'args': []})
agent = driver.execute_script("return navigator.userAgent")
print "rawUa:", agent
return driver
3) 抓取网站以确定用户代理并打印出来
def use_driver(driver, URL):
website = driver.get(URL)
html = WebDriverWait(driver, 1).until(EC.presence_of_element_located((By.ID, "rawUa")))
return text
4) 比较结果
driver = create_phantomJS()
text = use_driver(driver, URL)
print text
输出是两个不同的用户代理。
rawUa: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1
rawUa: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
python这种场景下如何匹配User Agent?
改进 initScrip 可能会奏效。
initScript="""
this.onInitialized=function() {
console.log("[INFO] TESTING NAVIGATOR VALUE");
if (navigator.userAgent == this.settings.userAgent){return};
navigator={"User-Agent":this.settings.userAgent};
}.bind(this);
"""
导航器必须设置为新对象。驱动程序创建之后的打印不会给出正确的测试结果,因为处理程序 onInitialized 将在页面创建之后和 URL 请求之前被调用。
下面的代码为 phantomJS 实例设置用户代理,打印它,然后抓取一个网站再次确定它。结果不同。怎么会这样?我还没有能够重现 this 明显的解决方案。
1) 设置 ONE 用户代理
serviceDefaults=["--ignore-ssl-errors=yes",]
desiredDefaults={
"phantomjs.page.settings.userAgent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64)
AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"}
2) 设置驱动和打印用户代理
def create_phantomJS():
driver = webdriver.PhantomJS("phantomjs.exe", desired_capabilities=desiredDefaults, service_args=serviceDefaults)
phantom_exc_uri='/session/$sessionId/phantom/execute'
driver.command_executor._commands['executePhantomScript'] = ('POST', phantom_exc_uri)
initScript="""
this.onInitialized=function() {
var page=this;
if (page.navigator == page.settings.userAgent){return};
page.settings.navigator = page.settings.userAgent;
}
"""
driver.execute('executePhantomScript',{'script': initScript, 'args': []})
agent = driver.execute_script("return navigator.userAgent")
print "rawUa:", agent
return driver
3) 抓取网站以确定用户代理并打印出来
def use_driver(driver, URL):
website = driver.get(URL)
html = WebDriverWait(driver, 1).until(EC.presence_of_element_located((By.ID, "rawUa")))
return text
4) 比较结果
driver = create_phantomJS()
text = use_driver(driver, URL)
print text
输出是两个不同的用户代理。
rawUa: Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/538.1 (KHTML, like Gecko) PhantomJS/2.1.1 Safari/538.1
rawUa: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36
python这种场景下如何匹配User Agent?
改进 initScrip 可能会奏效。
initScript="""
this.onInitialized=function() {
console.log("[INFO] TESTING NAVIGATOR VALUE");
if (navigator.userAgent == this.settings.userAgent){return};
navigator={"User-Agent":this.settings.userAgent};
}.bind(this);
"""
导航器必须设置为新对象。驱动程序创建之后的打印不会给出正确的测试结果,因为处理程序 onInitialized 将在页面创建之后和 URL 请求之前被调用。