使用 python 和 Selenium 抓取 HTML 元素中的 innerText?

Using python and Selenium to scrape the innerText within an HTML element?

我写了一个脚本,使用 selenium 和 pyautogui 模块登录并从元素中抓取一个值并打印它,但它打印了两个破折号 --

这是包含我要检索的值 417 的 HTML:

<p id="totReqCountVal" class="trailer-0 avenir-regular font-size-4 text-green js-total-requests">417</p>

这是我试过的相关代码:

from selenium import webdriver
from selenium.webdriver.common.by import By

browser.get('website_to_be_scraped')
browser.find_element(By.ID, 'totReqCountVal')

然后我尝试了:

views = browser.find_element(By.ID, 'totReqCountVal')
    print(views)

哪个returns:

(session="12e48df447f7df855a1ee596ba609a30", element="1027ec31-8cb8-4758-b4b0-82b85628ed6c")

在一些帮助下,我还尝试了以下方法:

使用 CSS_SELECTOR 和文本属性:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p#totReqCountVal[class$='js-total-requests']"))).text)
Using XPATH and get_attribute("innerHTML"):

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[@id='totReqCountVal' and contains(@class, 'js-total-requests')]"))).get_attribute("innerHTML"))

添加了以下导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

我已经通过 devtools 检查了定位器策略是否唯一标识元素,检查了 iframe 和影子根。

如何检索 417 值?

views 打印时正确打印:

(session="12e48df447f7df855a1ee596ba609a30", element="1027ec31-8cb8-4758-b4b0-82b85628ed6c")

解决方案

要打印文本 417 你需要引入 for the and you can use either of the following :

  • 使用 CSS_SELECTORtext 属性:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p#totReqCountVal[class$='js-total-requests']"))).text)
    
  • 使用 XPATHget_attribute("innerHTML"):

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[@id='totReqCountVal' and contains(@class, 'js-total-requests')]"))).get_attribute("innerHTML"))
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in


参考资料

Link 到有用的文档:

  • get_attribute()方法Gets the given attribute or property of the element.
  • text属性returnsThe text of the element.
  • Difference between text and innerHTML using Selenium