如何使用 Selenium 和 Python 访问嵌套在多个 shadowRoot 中的 html

How to access the html nested within multiple shadowRoot using Selenium and Python

我正在尝试构建一个机器人来解决网站上的 Wordle 难题 (https://www.powerlanguage.co.uk/wordle/)

我正在使用 selenium 输入猜测然后尝试检查页面以查看哪些猜测是正确的和不正确的

我在检查 chrome 上的元素时可以看到此信息,但是使用 selenium html returned 更短并且指向 javascript 应用程序?有没有办法在 selenium 中 return 检查 html ?这是我的代码。

from selenium.common.exceptions import NoSuchElementException
from selenium.common.exceptions import TimeoutException
from selenium.common.exceptions import ElementClickInterceptedException
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait

driver = webdriver.Chrome(executable_path=r"/Users/1/Downloads/chromedriver", options=chrome_options)
driver.get("https://www.powerlanguage.co.uk/wordle/")
time.sleep(1)
sends=driver.find_element_by_xpath("/html/body")
sends.click()
sends.send_keys("adieu")
sends.send_keys(Keys.ENTER)
sends.get_attribute('innerHTML')

这就是htmlreturn里面的内容

这是我在网站上看到的

所需信息为innerHTML is within multiple


解决方案

要提取你需要使用的信息 and you can use the following :

  • 代码块:

    driver.get("https://www.powerlanguage.co.uk/wordle/")
    time.sleep(1)
    sends=driver.find_element(By.XPATH, "/html/body")
    sends.click()
    sends.send_keys("adieu")
    sends.send_keys(Keys.ENTER)
    inner_texts = [my_elem.get_attribute("outerHTML") for my_elem in driver.execute_script("""return document.querySelector('game-app').shadowRoot.querySelector('game-row').shadowRoot.querySelectorAll('game-tile[letter]')""")]
    for inner_text in inner_texts:
    print(inner_text)
    
  • 控制台输出:

    <game-tile letter="a" evaluation="absent" reveal=""></game-tile>
    <game-tile letter="d" evaluation="absent"></game-tile>
    <game-tile letter="i" evaluation="correct"></game-tile>
    <game-tile letter="e" evaluation="absent"></game-tile>
    <game-tile letter="u" evaluation="absent"></game-tile>
    

参考资料

您可以在以下位置找到一些相关讨论:

如果您正在寻找一个完整的 Python Selenium 解决方案来以编程方式解决 Wordle Game,这里有一个使用 SeleniumBase framework. The solution comes with a YouTube video: Solving Wordle using SeleniumBase, as well as the Python code of the solution 的解决方案,以及一个 GIF 的预期内容:

该代码使用特殊的 SeleniumBase ::shadow 选择器以穿透多层 Shadow-DOM。这是下面的代码,它可以是 运行 在调用 pip install seleniumbase 之后获取所有 Python 依赖项:

import ast
import random
import requests
from seleniumbase import __version__
from seleniumbase import BaseCase

class WordleTests(BaseCase):
    word_list = []

    def initalize_word_list(self):
        js_file = "https://www.powerlanguage.co.uk/wordle/main.e65ce0a5.js"
        req_text = requests.get(js_file).text
        start = req_text.find("var La=") + len("var La=")
        end = req_text.find("],", start) + 1
        word_string = req_text[start:end]
        self.word_list = ast.literal_eval(word_string)

    def modify_word_list(self, word, letter_status):
        new_word_list = []
        correct_letters = []
        present_letters = []
        for i in range(len(word)):
            if letter_status[i] == "correct":
                correct_letters.append(word[i])
                for w in self.word_list:
                    if w[i] == word[i]:
                        new_word_list.append(w)
                self.word_list = new_word_list
                new_word_list = []
        for i in range(len(word)):
            if letter_status[i] == "present":
                present_letters.append(word[i])
                for w in self.word_list:
                    if word[i] in w and word[i] != w[i]:
                        new_word_list.append(w)
                self.word_list = new_word_list
                new_word_list = []
        for i in range(len(word)):
            if (
                letter_status[i] == "absent"
                and word[i] not in correct_letters
                and word[i] not in present_letters
            ):
                for w in self.word_list:
                    if word[i] not in w:
                        new_word_list.append(w)
                self.word_list = new_word_list
                new_word_list = []

    def test_wordle(self):
        self.open("https://www.powerlanguage.co.uk/wordle/")
        self.click("game-app::shadow game-modal::shadow game-icon")
        self.initalize_word_list()
        keyboard_base = "game-app::shadow game-keyboard::shadow "
        word = random.choice(self.word_list)
        total_attempts = 0
        success = False
        for attempt in range(6):
            total_attempts += 1
            word = random.choice(self.word_list)
            letters = []
            for letter in word:
                letters.append(letter)
                button = 'button[data-key="%s"]' % letter
                self.click(keyboard_base + button)
            button = 'button[data-key="↵"]'
            self.click(keyboard_base + button)
            self.sleep(1)  # Time for the animation
            row = 'game-app::shadow game-row[letters="%s"]::shadow ' % word
            tile = row + "game-tile:nth-of-type(%s)"
            letter_status = []
            for i in range(1, 6):
                letter_eval = self.get_attribute(tile % str(i), "evaluation")
                letter_status.append(letter_eval)
            if letter_status.count("correct") == 5:
                success = True
                break
            self.word_list.remove(word)
            self.modify_word_list(word, letter_status)

        self.save_screenshot_to_logs()
        print('\nWord: "%s"\nAttempts: %s' % (word.upper(), total_attempts))
        if not success:
            self.fail("Unable to solve for the correct word in 6 attempts!")
        self.sleep(3)

由于更新了 Shadow-DOM 方法,此解决方案需要最低 SeleniumBase 版本 2.4.0(或更高版本)。 (Here are the Release Notes of that version.)

请注意,SeleniumBase 测试是 运行 使用 pytest。此外,Wordle 网站在使用 headless Chrome 打开时显示略有不同,因此在 运行 运行此示例时不要使用 Chrome 的 headless 模式。