在新标签页中打开网页 Selenium + Python

Open web in new tab Selenium + Python

所以我正在尝试在我的 WebDriver 中的新选项卡上打开网站。我想这样做,因为使用 PhantomJS 为每个网站打开一个新的 WebDriver 大约需要 3.5 秒,我想要更快的速度...

我正在使用多进程 python 脚本,我想从每个页面获取一些元素,所以工作流程是这样的:

Open Browser

Loop throught my array
For element in array -> Open website in new tab -> do my business -> close it

但是我找不到任何方法来实现这个。

这是我正在使用的代码。网站之间需要永远,我需要它很快......允许使用其他工具,但我不知道有多少工具可以用来抓取用 JavaScript 加载的网站内容(当某些事件被触发时创建的 div)加载等)这就是我需要 Selenium 的原因...BeautifulSoup 不能用于我的某些页面。

#!/usr/bin/env python
import multiprocessing, time, pika, json, traceback, logging, sys, os, itertools, urllib, urllib2, cStringIO, mysql.connector, shutil, hashlib, socket, urllib2, re
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from PIL import Image
from os import listdir
from os.path import isfile, join
from bs4 import BeautifulSoup
from pprint import pprint

def getPhantomData(parameters):
    try:
        # We create WebDriver
        browser = webdriver.Firefox()
        # Navigate to URL
        browser.get(parameters['target_url'])
        # Find all links by Selector
        links = browser.find_elements_by_css_selector(parameters['selector'])

        result = []
        for link in links:
            # Extract link attribute and append to our list
            result.append(link.get_attribute(parameters['attribute']))
        browser.close()
        browser.quit()
        return json.dumps({'data': result})
    except Exception, err:
        browser.close()
        browser.quit()
        print err

def callback(ch, method, properties, body):
    parameters = json.loads(body)
    message = getPhantomData(parameters)

    if message['data']:
        ch.basic_ack(delivery_tag=method.delivery_tag)
    else:
        ch.basic_reject(delivery_tag=method.delivery_tag, requeue=True)

def consume():
    credentials = pika.PlainCredentials('invitado', 'invitado')
    rabbit = pika.ConnectionParameters('localhost',5672,'/',credentials)
    connection = pika.BlockingConnection(rabbit)
    channel = connection.channel()

    # Conectamos al canal
    channel.queue_declare(queue='com.stuff.images', durable=True)
    channel.basic_consume(callback,queue='com.stuff.images')

    print ' [*] Waiting for messages. To exit press CTRL^C'
    try:
        channel.start_consuming()
    except KeyboardInterrupt:
        pass

workers = 5
pool = multiprocessing.Pool(processes=workers)
for i in xrange(0, workers):
    pool.apply_async(consume)

try:
    while True:
        continue
except KeyboardInterrupt:
    print ' [*] Exiting...'
    pool.terminate()
    pool.join()

编者注:此答案不再适用于新的 Selenium 版本。参考.


您可以通过组合键 COMMAND + T 或 [=22= 来实现选项卡的 opening/closing ]命令 + W (OSX)。在其他操作系统上,您可以使用 CONTROL + T / CONTROL + W.

在 selenium 中,您可以模拟这种行为。 您将需要创建一个 webdriver 和与您需要的测试一样多的选项卡。

这是代码。

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("http://www.google.com/")

#open tab
driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 't') 
# You can use (Keys.CONTROL + 't') on other OSs

# Load a page 
driver.get('http://whosebug.com/')
# Make the tests...

# close the tab
# (Keys.CONTROL + 'w') on other OSs.
driver.find_element_by_tag_name('body').send_keys(Keys.COMMAND + 'w') 


driver.close()
browser.execute_script('''window.open("http://bings.com","_blank");''')

其中 browserwebDriver

经过长时间的努力,下面的方法对我有用:

driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.TAB)

windows = driver.window_handles

time.sleep(3)
driver.switch_to.window(windows[1])

这是改编自另一个示例的通用代码:

from selenium import webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Firefox()
driver.get("http://www.google.com/")

#open tab
# ... take the code from the options below

# Load a page 
driver.get('http://bings.com')
# Make the tests...

# close the tab
driver.quit()

可能的方法是:

  1. 发送<CTRL> + <T>到一个元素

    #open tab
    driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
    
  2. 通过动作链发送<CTRL> + <T>

    ActionChains(driver).key_down(Keys.CONTROL).send_keys('t').key_up(Keys.CONTROL).perform()
    
  3. 执行 javascript 片段

    driver.execute_script('''window.open("http://bings.com","_blank");''')
    

    为了实现这一点,您需要确保首选项 browser.link.open_newwindow and browser.link.open_newwindow.restriction 设置正确。最新版本的默认值是可以的,否则你应该需要:

    fp = webdriver.FirefoxProfile()
    fp.set_preference("browser.link.open_newwindow", 3)
    fp.set_preference("browser.link.open_newwindow.restriction", 2)
    
    driver = webdriver.Firefox(browser_profile=fp)
    

    问题是那些预设为 other values and are frozen at least selenium 3.4.0. When you use the profile to set them with the java binding there comes an exception 的首选项和 python 绑定的新值将被忽略。

    在 Java 中,有一种方法可以在与 geckodriver 交谈时设置这些首选项而无需指定配置文件对象,但它似乎尚未在 python绑定:

    FirefoxOptions options = new FirefoxOptions().setProfile(fp);
    options.addPreference("browser.link.open_newwindow", 3);
    options.addPreference("browser.link.open_newwindow.restriction", 2);
    FirefoxDriver driver = new FirefoxDriver(options);
    

第三个选项在 selenium 3.4.0 中为 python 做了 stop working

前两个选项似乎也stop working in selenium 3.4.0. They do depend on sending CTRL key event to an element. At first glance it seem that is a problem of the CTRL key, but it is failing because of the new multiprocess feature of Firefox。可能是这种新架构强加了新的方法来做到这一点,或者可能是一个临时的实现问题。无论如何,我们可以通过以下方式禁用它:

fp = webdriver.FirefoxProfile()
fp.set_preference("browser.tabs.remote.autostart", False)
fp.set_preference("browser.tabs.remote.autostart.1", False)
fp.set_preference("browser.tabs.remote.autostart.2", False)

driver = webdriver.Firefox(browser_profile=fp)

...然后就可以使用第一种方式成功了

在一次讨论中,Simon 明确提到:

While the datatype used for storing the list of handles may be ordered by insertion, the order in which the WebDriver implementation iterates over the window handles to insert them has no requirement to be stable. The ordering is arbitrary.


使用 Selenium v3.x 通过 Python 在 新标签页 中打开网站 现在容易多了。我们必须为 number_of_windows_to_be(2) 引入 WebDriverWait,然后在每次打开新的 tab/window 时收集 window 句柄,最后遍历 window 句柄和 switchTo().window(newly_opened) 根据需要。这是一个解决方案,您可以在 初始 TAB 中打开 http://www.google.co.in 并在 相邻 TAB 中打开 https://www.yahoo.com

  • 代码块:

      from selenium import webdriver
      from selenium.webdriver.support.ui import WebDriverWait
      from selenium.webdriver.support import expected_conditions as EC
    
      options = webdriver.ChromeOptions() 
      options.add_argument("start-maximized")
      options.add_argument('disable-infobars')
      driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
      driver.get("http://www.google.co.in")
      print("Initial Page Title is : %s" %driver.title)
      windows_before  = driver.current_window_handle
      print("First Window Handle is : %s" %windows_before)
      driver.execute_script("window.open('https://www.yahoo.com')")
      WebDriverWait(driver, 10).until(EC.number_of_windows_to_be(2))
      windows_after = driver.window_handles
      new_window = [x for x in windows_after if x != windows_before][0]
      driver.switch_to.window(new_window)
      print("Page Title after Tab Switching is : %s" %driver.title)
      print("Second Window Handle is : %s" %new_window)
    
  • 控制台输出:

      Initial Page Title is : Google
      First Window Handle is : CDwindow-B2B3DE3A222B3DA5237840FA574AF780
      Page Title after Tab Switching is : Yahoo
      Second Window Handle is : CDwindow-D7DA7666A0008ED91991C623105A2EC4
    
  • 浏览器快照:


结尾

您可以找到 based discussion in

我尝试了很长时间在正文上使用 action_keys 和 send_keys 复制 Chrome 运行 中的标签。唯一对我有用的是答案 。这就是我的重复标签 def 最终的样子,可能不是最好的,但对我来说效果很好。

def duplicate_tabs(number, chromewebdriver):
#Once on the page we want to open a bunch of tabs
url = chromewebdriver.current_url
for i in range(number):
    print('opened tab: '+str(i))
    chromewebdriver.execute_script("window.open('"+url+"', 'new_window"+str(i)+"')")

它基本上从 python 内部运行一些 java,它非常有用。希望这对某人有所帮助。

注意:我正在使用 Ubuntu,它应该没有什么不同,但如果它对你不起作用,这可能是原因。

奇怪的是,这么多答案,而且它们都使用 JS 和键盘快捷键等替代品,而不仅仅是使用 selenium 功能:

def newTab(driver, url="about:blank"):
    wnd = driver.execute(selenium.webdriver.common.action_chains.Command.NEW_WINDOW)
    handle = wnd["value"]["handle"]
    driver.switch_to.window(handle)
    driver.get(url) # changes the handle
    return driver.current_window_handle
  • OS:赢 10,
  • Python 3.8.1
    • 硒==3.141.0
from selenium import webdriver
import time

driver = webdriver.Firefox(executable_path=r'TO\Your\Path\geckodriver.exe')
driver.get('https://www.google.com/')

# Open a new window
driver.execute_script("window.open('');")
# Switch to the new window
driver.switch_to.window(driver.window_handles[1])
driver.get("http://whosebug.com")
time.sleep(3)

# Open a new window
driver.execute_script("window.open('');")
# Switch to the new window
driver.switch_to.window(driver.window_handles[2])
driver.get("https://www.reddit.com/")
time.sleep(3)
# close the active tab
driver.close()
time.sleep(3)

# Switch back to the first tab
driver.switch_to.window(driver.window_handles[0])
driver.get("https://bing.com")
time.sleep(3)

# Close the only tab, will also close the browser.
driver.close()

参考:Need Help Opening A New Tab in Selenium

其他解决方案不适用于 chrome 驱动程序 v83

相反,它的工作原理如下,假设只有 1 个打开的标签:

driver.execute_script("window.open('');")
driver.switch_to.window(driver.window_handles[1])
driver.get("https://www.example.com")

如果打开的标签已经超过1个,你应该先获取最后一个新创建的标签的索引,然后切换到标签,然后再调用url(归功于tylerl) :

driver.execute_script("window.open('');")
driver.switch_to.window(len(driver.window_handles)-1)
driver.get("https://www.example.com")

在 chrome 浏览器中的同一 window 中打开 新的空选项卡 不可能 取决于我知识,但您可以使用 web-link.

打开新选项卡

到目前为止,我已经上网并且在这个问题上得到了很好的工作内容。 请尽量按照步骤进行,不要遗漏。

import selenium.webdriver as webdriver
from selenium.webdriver.common.keys import Keys

driver = webdriver.Chrome()
driver.get('https://www.google.com?q=python#q=python')
first_link = driver.find_element_by_class_name('l')

# Use: Keys.CONTROL + Keys.SHIFT + Keys.RETURN to open tab on top of the stack 
first_link.send_keys(Keys.CONTROL + Keys.RETURN)

# Switch tab to the new tab, which we will assume is the next one on the right
driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + Keys.TAB)

driver.quit()

到目前为止,我认为这是更好的解决方案。

致谢:https://gist.github.com/lrhache/7686903

tabs = {}

def new_tab():
    global browser
    hpos = browser.window_handles.index(browser.current_window_handle)
    browser.execute_script("window.open('');")
    browser.switch_to.window(browser.window_handles[hpos + 1])
    return(browser.current_window_handle)
    
def switch_tab(name):
    global tabs
    global browser
    if not name in tabs.keys():
        tabs[name] = {'window_handle': new_tab(), 'url': url+name}
        browser.get(tabs[name]['url'])
    else:
        browser.switch_to.window(tabs[name]['window_handle'])

为此我会坚持 ActionChains

这是一个打开新选项卡并切换到该选项卡的函数:

import time
from selenium.webdriver.common.action_chains import ActionChains

def open_in_new_tab(driver, element, switch_to_new_tab=True):
    base_handle = driver.current_window_handle
    # Do some actions
    ActionChains(driver) \
        .move_to_element(element) \
        .key_down(Keys.COMMAND) \
        .click() \
        .key_up(Keys.COMMAND) \
        .perform()
    
    # Should you switch to the new tab?
    if switch_to_new_tab:
        new_handle = [x for x in driver.window_handles if x!=base_handle]
        assert len new_handle == 1 # assume you are only opening one tab at a time
        
        # Switch to the new window
        driver.switch_to.window(new_handle[0])

        # I like to wait after switching to a new tab for the content to load
        # Do that either with time.sleep() or with WebDriverWait until a basic
        # element of the page appears (such as "body") -- reference for this is 
        # provided below
        time.sleep(0.5)        

        # NOTE: if you choose to switch to the window/tab, be sure to close
        # the newly opened window/tab after using it and that you switch back
        # to the original "base_handle" --> otherwise, you'll experience many
        # errors and a painful debugging experience...

以下是应用该函数的方法:

# Remember your starting handle
base_handle = driver.current_window_handle

# Say we have a list of elements and each is a link:
links = driver.find_elements_by_css_selector('a[href]')

# Loop through the links and open each one in a new tab
for link in links:
    open_in_new_tab(driver, link, True)
    
    # Do something on this new page
    print(driver.current_url)
    
    # Once you're finished, close this tab and switch back to the original one
    driver.close()
    driver.switch_to.window(base_handle)
    
    # You're ready to continue to the next item in your loop

您可以这样 wait until the page is loaded

正如已经多次提到的,以下方法不再有效:

driver.find_element_by_tag_name('body').send_keys(Keys.CONTROL + 't')
ActionChains(driver).key_down(Keys.CONTROL).send_keys('t').key_up(Keys.CONTROL).perform()

此外,driver.execute_script("window.open('');") 正在运行,但受到弹出窗口阻止程序的限制。我并行处理数百个选项卡(使用 scrapy 进行网络抓取)。但是,弹出窗口拦截器在使用 JavaScript 的 window.open('') 打开 20 个新标签后变得活跃,因此破坏了我的抓取工具。

作为解决方法,我将一个选项卡声明为“master”,它打开了以下 helper.html

<!DOCTYPE html>
<html><body>
<a id="open_new_window" href="about:blank" target="_blank">open a new window</a>
</body></html>

现在,我的(简化的)抓取工具可以通过有意单击 link 来打开任意数量的选项卡,弹出式博主根本不会考虑:

# master
master_handle = driver.current_window_handle
helper = os.path.join(os.path.dirname(os.path.abspath(__file__)), "helper.html")
driver.get(helper)

# open new tabs
for _ in range(100):
    window_handle = driver.window_handles          # current state
    driver.switch_to_window(master_handle)
    driver.find_element_by_id("open_new_window").click()
    window_handle = set(driver.window_handles).difference(window_handle).pop()
    print("new window handle:", window_handle)

通过 JavaScript 的 window.close() 关闭这些 windows 没有问题。

#Change the method of finding the element if needed
self.find_element_by_xpath(element).send_keys(Keys.CONTROL + Keys.ENTER)

这将找到该元素并在新选项卡中将其打开。 self 只是用于 webdriver 对象的名称。

试试这个它会起作用:

# Open a new Tab
driver.execute_script("window.open('');")

# Switch to the new window and open URL B
driver.switch_to.window(driver.window_handles[1])
driver.get(tab_url)
from selenium import webdriver
import time

driver = webdriver.Firefox()
driver.get('https://www.google.com')

driver.execute_script("window.open('');")
time.sleep(5)

driver.switch_to.window(driver.window_handles[1])
driver.get("https://facebook.com")
time.sleep(5)

driver.close()
time.sleep(5)

driver.switch_to.window(driver.window_handles[0])
driver.get("https://www.yahoo.com")
time.sleep(5)

#driver.close()

https://www.edureka.co/community/52772/close-active-current-without-closing-browser-selenium-python

您可以使用它打开新标签页

driver.execute_script("window.open('http://google.com', 'new_window')")

仅供日后参考,简单的方法可以这样:

driver.switch_to.new_window()
t=driver.window_handles[-1]# Get the handle of new tab
driver.switch_to.window(t)
driver.get(target_url) # Now the target url is opened in new tab

这对我有用:-

link = "https://www.google.com/"
driver.execute_script('''window.open("about:blank");''')  # Opening a blank new tab
driver.switch_to.window(driver.window_handles[1])  # Switching to newly opend tab
driver.get(link)

Selenium 4.0.0版本支持以下操作:

  • 打开一个新标签尝试:

    driver.switch_to.new_window()

  • 切换到特定标签(注意tabID从0开始):

    driver.switch_to.window(driver.window_handles[tabID])