Selenium webdriver：修改 navigator.webdriver 标志以防止 selenium 检测

Question

我正在尝试使用 selenium 和 chrome 在网站中自动执行一项非常基本的任务，但网站以某种方式检测到 chrome 何时由 selenium 驱动并阻止每个请求。我怀疑该网站依赖于暴露的 DOM 变量，例如来检测硒驱动的浏览器。

我的问题是，有没有办法让 navigator.webdriver 标志为假？我愿意在修改后尝试重新编译 selenium 源，但我似乎无法在存储库中的任何地方找到 NavigatorAutomationInformation 源 https://github.com/SeleniumHQ/selenium

非常感谢任何帮助

P.S：我还尝试了 https://w3c.github.io/webdriver/#interface

中的以下内容

Object.defineProperty(navigator, 'webdriver', {
    get: () => false,
  });

但它只会在初始页面加载后更新属性。我认为站点在我的脚本执行之前检测到变量。

Answer 1

首先更新¹

execute_cdp_cmd(): With the availability of execute_cdp_cmd(cmd, cmd_args) command now you can easily execute google-chrome-devtools commands using 。使用此功能，您可以轻松修改 navigator.webdriver 以防止检测到 Selenium。

防止检测²

为了防止 Selenium 驱动的 WebDriver 被检测到，一种利基方法将包括以下任一/所有步骤：

添加参数--disable-blink-features=AutomationControlled

from selenium import webdriver

options = webdriver.ChromeOptions() 
options.add_argument('--disable-blink-features=AutomationControlled')
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://www.website.com")

You can find a relevant detailed discussion in

通过execute_cdp_cmd()命令轮换user-agent如下：

#Setting up Chrome/83.0.4103.53 as useragent
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})

将navigator的属性值更改为webdriver 到 undefined

driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")

排除 enable-automation 个开关的集合

options.add_experimental_option("excludeSwitches", ["enable-automation"])

关闭useAutomationExtension

options.add_experimental_option('useAutomationExtension', False)

示例代码³

将上述所有步骤组合起来，有效的代码块将是：

from selenium import webdriver

options = webdriver.ChromeOptions() 
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
driver.get('https://www.httpbin.org/headers')

历史

根据 W3C 编辑草案 当前的实现严格提到：

The webdriver-active flag is set to true when the user agent is under remote control which is initially set to false.

此外，

Navigator includes NavigatorAutomationInformation;

需要注意的是：

The NavigatorAutomationInformation interface should not be exposed on WorkerNavigator.

NavigatorAutomationInformation接口定义为：

interface mixin NavigatorAutomationInformation {
    readonly attribute boolean webdriver;
};

which returns true if webdriver-active flag 已设置，否则为 false.

最后，navigator.webdriver 定义了一种标准方式，用于协作用户代理通知文档它由 WebDriver 控制，以便备用代码路径可以在自动化过程中被触发。

Caution: Altering/tweaking the above mentioned parameters may block the navigation and get the WebDriver instance detected.

更新（2019 年 11 月 6 日）

在当前的实现中，访问网页而不被发现的理想方法是使用 ChromeOptions() class 添加几个参数到：

排除 enable-automation 个开关的集合
关闭useAutomationExtension

通过ChromeOptions的实例如下：

Java 示例：

System.setProperty("webdriver.chrome.driver", "C:\Utility\BrowserDrivers\chromedriver.exe");
ChromeOptions options = new ChromeOptions();
options.setExperimentalOption("excludeSwitches", Collections.singletonList("enable-automation"));
options.setExperimentalOption("useAutomationExtension", false);
WebDriver driver =  new ChromeDriver(options);
driver.get("https://www.google.com/");

Python 例子

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\path\to\chromedriver.exe')
driver.get("https://www.google.com/")

Ruby 例子

  options = Selenium::WebDriver::Chrome::Options.new
  options.add_argument("--disable-blink-features=AutomationControlled")
  driver = Selenium::WebDriver.for :chrome, options: options

传奇

¹：仅适用于 Selenium 的 Python 客户端。

²：仅适用于 Selenium 的 Python 客户端。

³：仅适用于 Selenium 的 Python 客户端。

Answer 2

之前（在浏览器控制台 window）：

> navigator.webdriver
true

变化（在硒中）：

// C#
var options = new ChromeOptions();
options.AddExcludedArguments(new List<string>() { "enable-automation" });

// Python
options.add_experimental_option("excludeSwitches", ['enable-automation'])

之后（在浏览器控制台 window）：

> navigator.webdriver
undefined

This will not work for version ChromeDriver 79.0.3945.16 and above. See the release notes here

Answer 3

现在您可以使用 cdp 命令完成此操作：

driver.execute_cdp_cmd("Page.addScriptToEvaluateOnNewDocument", {
  "source": """
    Object.defineProperty(navigator, 'webdriver', {
      get: () => undefined
    })
  """
})

driver.get(some_url)

顺便说一句，你想要returnundefined，false是一个死赠品。

Answer 4

我想添加 Java 替代 pguardiario

提到的 cdp 命令方法

Map<String, Object> params = new HashMap<String, Object>();
params.put("source", "Object.defineProperty(navigator, 'webdriver', { get: () => undefined })");
driver.executeCdpCommand("Page.addScriptToEvaluateOnNewDocument", params);

为了使其正常工作，您需要使用 org.openqa.selenium.chromium.ChromiumDriver 包中的 ChromiumDriver。据我所知，该包未包含在 Selenium 3.141.59 中，因此我使用了 Selenium 4 alpha。

此外，excludeSwitches/useAutomationExtension 实验性选项似乎不再适用于 ChromeDriver 79 和 Chrome 79。

Answer 5

ChromeDriver:

终于用一个简单的标志找到了解决这个问题的简单方法！ :)

--disable-blink-features=AutomationControlled

navigator.webdriver=true 将不再显示该标志集。

有关您可以禁用的内容的列表，check them out here

Answer 6

最终这解决了 Chrome驱动程序的问题，Chrome 大于 v79。

ChromeOptions options = new ChromeOptions();
options.addArguments("--disable-blink-features");
options.addArguments("--disable-blink-features=AutomationControlled");
ChromeDriver driver = new ChromeDriver(options);
Map<String, Object> params = new HashMap<String, Object>();
params.put("source", "Object.defineProperty(navigator, 'webdriver', { get: () => undefined })");
driver.executeCdpCommand("Page.addScriptToEvaluateOnNewDocument", params);

Answer 7

从 2020 年 4 月起，投票最高答案的 2019 年 11 月 6 日更新中提到的排除启用自动化开关的集合不再起作用。相反，我收到以下错误：

ERROR:broker_win.cc(55)] Error reading broker pipe: The pipe has been ended. (0x6D)

这是截至 2020 年 4 月 6 日与 Chrome 80 的工作情况。

之前（在 Chrome 控制台 window）：

> navigator.webdriver
true

Python 示例：

options = webdriver.ChromeOptions()
options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")

之后（在 Chrome 控制台 window）：

> navigator.webdriver
undefined

Answer 8

如果您使用 Remote Webdriver ，下面的代码会将 navigator.webdriver 设置为 undefined。

为 ChromeDriver 81.0.4044.122

工作

Python 示例：

    options = webdriver.ChromeOptions()
    # options.add_argument("--headless")
    options.add_argument('--disable-gpu')
    options.add_argument('--no-sandbox')
    driver = webdriver.Remote(
       'localhost:9515', desired_capabilities=options.to_capabilities())
    script = '''
    Object.defineProperty(navigator, 'webdriver', {
        get: () => undefined
    })
    '''
    driver.execute_script(script)

Answer 9

不要使用cdp 命令更改webdriver 值，因为这会导致不一致，稍后可以用来检测webdriver。使用下面的代码，这将删除 webdriver 的所有痕迹。

options.add_argument("--disable-blink-features")
options.add_argument("--disable-blink-features=AutomationControlled")

Answer 10

如上述评论所述 - 以下选项对我来说完全有效（ 在 Java)-

ChromeOptions options = new ChromeOptions();
options.addArguments("--incognito", "--disable-blink-features=AutomationControlled");

Answer 11

对于那些尝试过这些技巧的人，请确保同时检查您使用的用户代理是否与您的抓取工具所在的平台（移动/桌面/平板电脑）相对应意思是模仿。我花了一段时间才意识到这是我的致命弱点；）

Answer 12

python的简单技巧：

options = webdriver.ChromeOptions()    
options.add_argument("--disable-blink-features=AutomationControlled")

Answer 13

使用--disable-blink-features=AutomationControlled禁用navigator.webdriver

Answer 14

因为这个问题与 selenium 相关，所以跨浏览器解决方案覆盖 navigator.webdriver 很有用。这可以通过在目标页面的任何 JS 运行之前修补浏览器环境来完成，但不幸的是，除了 Chromium 之外，没有其他浏览器允许在文档加载之后和任何其他 JS 运行之前评估任意 JavaScript 代码（firefox 接近 Remote Protocol).

在修补之前，我们需要检查默认浏览器环境的外观。在更改属性之前，我们可以看到它的默认定义 Object.getOwnPropertyDescriptor()

Object.getOwnPropertyDescriptor(navigator, 'webdriver');
// undefined

所以通过这个快速测试我们可以看到 webdriver 属性没有在 navigator 中定义。它实际上定义在 Navigator.prototype:

Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver');
// {set: undefined, enumerable: true, configurable: true, get: ƒ}

更改拥有它的对象上的属性非常重要，否则可能会发生以下情况：

navigator.webdriver; // true if webdriver controlled, false otherwise
// this lazy patch is commonly found on the internet, it does not even set the right value
Object.defineProperty(navigator, 'webdriver', {
  get: () => undefined
});
navigator.webdriver; // undefined
Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver').get.apply(navigator);
// true

一个不那么幼稚的补丁会首先针对正确的对象并使用正确的属性定义，但深入挖掘我们会发现更多的不一致之处：

const defaultGetter = Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver').get;
defaultGetter.toString();
// "function get webdriver() { [native code] }"
Object.defineProperty(Navigator.prototype, 'webdriver', {
  set: undefined,
  enumerable: true,
  configurable: true,
  get: () => false
});
const patchedGetter = Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver').get;
patchedGetter.toString();
// "() => false"

一个完美的补丁不留痕迹，而不是替换getter函数，如果我们能拦截对它的调用并改变[=57]就好了=]ed值。 JavaScript 通过 Proxy apply handler:

原生支持

const defaultGetter = Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver').get;
defaultGetter.apply(navigator); // true
defaultGetter.toString();
// "function get webdriver() { [native code] }"
Object.defineProperty(Navigator.prototype, 'webdriver', {
  set: undefined,
  enumerable: true,
  configurable: true,
  get: new Proxy(defaultGetter, { apply: (target, thisArg, args) => {
    // emulate getter call validation
    Reflect.apply(target, thisArg, args);
    return false;
  }})
});
const patchedGetter = Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver').get;
patchedGetter.apply(navigator); // false
patchedGetter.toString();
// "function () { [native code] }"

现在唯一不一致的是函数名称，不幸的是覆盖了本机 toString() 表示中显示的函数名称。但即便如此，它仍可以传递通用正则表达式，通过在其字符串表示形式的末尾查找 { [native code] } 来搜索欺骗性浏览器本机函数。要消除这种不一致，您可以修补 Function.prototype.toString 并使其 return 对您修补的所有本机函数有效的本机字符串表示形式。

总而言之，在 selenium 中它可以应用于：

chrome.execute_cdp_cmd('Page.addScriptToEvaluateOnNewDocument', {'source': """
    Object.defineProperty(Navigator.prototype, 'webdriver', {
        set: undefined,
        enumerable: true,
        configurable: true,
        get: new Proxy(
            Object.getOwnPropertyDescriptor(Navigator.prototype, 'webdriver').get,
            { apply: (target, thisArg, args) => {
                // emulate getter call validation
                Reflect.apply(target, thisArg, args);
                return false;
            }}
        )
    });
"""})

playwright 项目维护了 Firefox 和 WebKit 的一个分支，以添加浏览器自动化功能，其中之一相当于 Page.addScriptToEvaluateOnNewDocument，但 Python 没有实现通信协议，但它可以从头开始实施。

Answer 15

Python

我尝试了此 post 中提到的大部分内容，但仍然遇到问题。现在拯救我的是 https://pypi.org/project/undetected-chromedriver

pip install undetected-chromedriver


import undetected_chromedriver.v2 as uc
from time import sleep
from random import randint


driver = uc.Chrome()
driver.get('www.your_url.here')
driver.maximize_window() 

sleep(randint(3,9))

有点慢，但我会在不工作的情况下慢慢来。

我想如果每个感兴趣的人都可以查看源代码，看看是什么提供了胜利。

Selenium webdriver：修改 navigator.webdriver 标志以防止 selenium 检测

Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

java

selenium

webdriver