如何使用 selenium Python 获取 html 的跨度数据绑定部分内的文本?

How to get the text inside span data-bind part of html using selenium Python?

这是我尝试使用 Python Selenium 进行网络抓取的 html 片段。

我正在尝试获取 span data-bind 中的文本 Add to bag

<div class="is-add-item-saving" data-bind="visible: isBusy" style="display: none;"></div>
<span class="aria-live" aria-role="status" aria-live="polite" data-bind="{ text: ariaLiveText }"></span>
<button data-bind="click: addToBag, css : buttonCss, attr: { 'aria-label': resources.pdp_cta_add_to_bag, disabled: isBusy }, markAndMeasure: 'pdp:add_to_bag_interactive'" data-test-id="add-button" aria-label="Add to bag">
    <span class="product-tick" data-bind="visible: showProductTick" style="display: none;"></span>
    <span data-bind="text: buttonText">Add to bag</span>

</button>

这是我目前尝试过的方法。

instock_element = driver.find_elements_by_xpath("//span[contains(@data-bind,'text: buttonText')]")
instock_element = driver.find_elements_by_xpath("//*[contains(text(), 'Add to bag')]")

当我遍历这些 instock_elements,

for value in instock_element:
     print("text : ",value.text)
     print(" id : ",value.id)
     if len(value.text) == 0:
          text = value.id
     else:
          print(value.text)
          text = value.text
          ins_list.append(text)

这些给我随机值 6489355d-9dd3-4d77-a0d7-b134ce48fae7 但不是文本 Add to bag.

试试这个(特别是 xpath):

from lxml import html

sample = """<div class="is-add-item-saving" data-bind="visible: isBusy" style="display: none;"></div>
<span class="aria-live" aria-role="status" aria-live="polite" data-bind="{ text: ariaLiveText }"></span>
<button data-bind="click: addToBag, css : buttonCss, attr: { 'aria-label': resources.pdp_cta_add_to_bag, disabled: isBusy }, markAndMeasure: 'pdp:add_to_bag_interactive'" data-test-id="add-button" aria-label="Add to bag">
    <span class="product-tick" data-bind="visible: showProductTick" style="display: none;"></span>
    <span data-bind="text: buttonText">Add to bag</span>

</button>"""

print(html.fromstring(sample).xpath("//*[@data-bind='text: buttonText']/text()"))

输出:

['Add to bag']

https://www.selenium.dev/selenium/docs/api/py/webdriver_remote/selenium.webdriver.remote.webelement.html#module-selenium.webdriver.remote.webelement

id

Internal ID used by selenium.

This is mainly for internal use. Simple use cases such as checking if 2 webelements refer to the same element, can be done using ==:

if element1 == element2: print("These 2 are equal")

使用value.get_attribute("id")获取id

获取文本使用:

value.text

如果失败使用:

value.get_attribute("textContent")

as value.text 仅检索显示在 UI

中的文本

您也可以为此使用 BeautifulSoup :

from bs4 import BeautifulSoup

html = """<div class="is-add-item-saving" data-bind="visible: isBusy" style="display: none;"></div>
<span class="aria-live" aria-role="status" aria-live="polite" data-bind="{ text: ariaLiveText }"></span>
<button data-bind="click: addToBag, css : buttonCss, attr: { 'aria-label': resources.pdp_cta_add_to_bag, disabled: isBusy }, markAndMeasure: 'pdp:add_to_bag_interactive'" data-test-id="add-button" aria-label="Add to bag">
    <span class="product-tick" data-bind="visible: showProductTick" style="display: none;"></span>
    <span data-bind="text: buttonText">Add to bag</span>
</button>"""

soup = BeautifulSoup(html)

tag = soup.find('span',{'data-bind':'text: buttonText'})
print(tag.text)

输出

Add to bag

要打印 text Add to bag 您可以使用以下任一方法 :

  • 使用css_selectorget_attribute("innerHTML"):

    print(driver.find_element_by_css_selector("button[data-test-id='add-button'][aria-label='Add to bag'] span").get_attribute("innerHTML"))
    
  • 使用 xpathtext 属性:

    print(driver.find_element_by_xpath("//button[@data-test-id='add-button' and @aria-label='Add to bag']//span").text)
    

理想情况下,您需要为 visibility_of_element_located() 引入 ,您可以使用以下任一方法 :

  • 使用 CSS_SELECTORtext 属性:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button[data-test-id='add-button'][aria-label='Add to bag'] span"))).text)
    
  • 使用 XPATHget_attribute():

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//button[@data-test-id='add-button' and @aria-label='Add to bag']//span"))).get_attribute("innerHTML"))
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    

You can find a relevant discussion in


参考资料

Link 到有用的文档:

  • get_attribute()方法Gets the given attribute or property of the element.
  • text属性returnsThe text of the element.
  • Difference between text and innerHTML using Selenium

如果您很难找到合适的元素, 简单的方法是不查找与 xpath 关联的所有元素, 您必须使用单个标签的 完整 xpath,然后使用 .text

获取它的文本

示例: text = driver.find_element_by_xpath("full xpath of the element").text