如何实现并行硒处理

How to implement parallel selenium processing

我之前曾问过这个问题[不活跃,现在已删除],但措辞不正确。我正在努力改进这一点。

如有任何帮助,我们将不胜感激。

我正在尝试做什么:[使用 selenium] 自动执行某些任务(Restful API)

我有什么运行NING:

class SomeClass(SomeOtherClass):
  def do_tasks(self, selections):
       booking_code = None
       task_done = 0

       driver = self.connect() #spawns a chrome browser

       #I want the below for loop to run in parallel

       for task in tasks:
           try:
               #check_if_task_is_in_search_result_&_then_open_in_new_tab
               #do_something
               task_done += 1
               #close_tab
           except:
               #handle_something

           driver.close()
           driver.switch_to.window(driver.window_handles[0])
           driver.refresh()

       try:
           check = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, 'xxxx'))).click()
       except NoSuchElementException as e:
           log_error(str(e))
       except TimeoutException as e:
           log_error(str(e))
           
       else:
           booking_code = str(driver.find_element_by_class_name("number").text).split(':')[1]
           driver.quit()

       return task_done, booking_code

这是顺序的,5 项任务大约需要 5 分钟。

并行运行

到目前为止我尝试了什么 - 将 for 循环部分引入新方法 - do_task。 导入:from joblib import Parallel, delayed

class SomeClass(SomeOtherClass):
  def do_task(self, task):
      driver = self.connect() #spawns a chrome browser
      try:
         #do_something
         task_done += 1
      except:
         #handle_something
      
      return task_done, driver 

 

  def get_booking_code(self, driver):
      try:
           check = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CLASS_NAME, 'xxxx'))).click()
      except NoSuchElementException as e:
           log_error(str(e))
      else:
           booking_code = str(driver.find_element_by_class_name("number").text).split(':')[1]
           driver.quit()
      return booking_code


if __name__ == '__main__':
    tasks = [
        ['task1'],
        ['task2']
  ]
  
    b = SomeClass(site='https://somesite.com/') #chrome connects to this via the self.connect()
    completed_tasks, driver = Parallel(n_jobs=-1)(delayed(b.do_task)(task) for task in tasks)
    booking_code = b.get_booking_code(driver)
    print(completed_tasks, booking_code)

它没有 运行。它生成一个空白 chrome 浏览器并立即关闭。

回溯如下:

  completed_tasks, driver = Parallel(n_jobs=-1)(delayed(b.do_task)(task) for task in tasks)  
  File "--\env\lib\site-packages\joblib\parallel.py", line 1054, in __call__
    self.retrieve()
  File "--\env\lib\site-packages\joblib\parallel.py", line 933, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "--\env\lib\site-packages\joblib\_parallel_backends.py", line 542, in wrap_future_result
    return future.result(timeout=timeout)
  File "--\python\python38\lib\concurrent\futures\_base.py", line 439, in result
    return self.__get_result()
  File "c:\users\okwud\appdata\local\programs\python\python38\lib\concurrent\futures\_base.py", line 388, in __get_result
    raise self._exception
selenium.common.exceptions.SessionNotCreatedException: Message: session not created       
from disconnected: unable to connect to renderer
  (Session info: chrome=91.0.4472.114)

理想情况下,您应该为每个任务创建单独的 'SomeClass' 对象,然后并行调用相应的函数,因为里面有 self.connect() 和 driver.quit() 语句并行执行的定义。请尝试为每个并行任务创建单独的 class 对象或组成一个公共会话并从并行执行方法中删除 driver.quit()。

我昨天解决了这个问题(并行任务 运行ning),但我现在面临一个全新的挑战,需要一个新的 post 来解决它。

我如何将任务并行 运行(使用我的第二个代码 - 'what I have tried so far'):


import multiprocessing

#code excluded on purpose

if __name__ == '__main__':
    tasks = [
        ['task1'],
        ['task2']
  ]
    b = SomeClass()
    with multiprocessing.Pool(processes=2) as p:
        p.map(b.do_task, tasks)

do_task方法仅returns完成任务的数量(这是一个任意的return值)