如何使用 selenium Python 获取 html 的跨度数据绑定部分内的文本?
How to get the text inside span data-bind part of html using selenium Python?
这是我尝试使用 Python Selenium 进行网络抓取的 html
片段。
我正在尝试获取 span data-bind
中的文本 Add to bag
。
<div class="is-add-item-saving" data-bind="visible: isBusy" style="display: none;"></div>
<span class="aria-live" aria-role="status" aria-live="polite" data-bind="{ text: ariaLiveText }"></span>
<button data-bind="click: addToBag, css : buttonCss, attr: { 'aria-label': resources.pdp_cta_add_to_bag, disabled: isBusy }, markAndMeasure: 'pdp:add_to_bag_interactive'" data-test-id="add-button" aria-label="Add to bag">
<span class="product-tick" data-bind="visible: showProductTick" style="display: none;"></span>
<span data-bind="text: buttonText">Add to bag</span>
</button>
这是我目前尝试过的方法。
instock_element = driver.find_elements_by_xpath("//span[contains(@data-bind,'text: buttonText')]")
instock_element = driver.find_elements_by_xpath("//*[contains(text(), 'Add to bag')]")
当我遍历这些 instock_elements
,
for value in instock_element:
print("text : ",value.text)
print(" id : ",value.id)
if len(value.text) == 0:
text = value.id
else:
print(value.text)
text = value.text
ins_list.append(text)
这些给我随机值 6489355d-9dd3-4d77-a0d7-b134ce48fae7
但不是文本 Add to bag
.
试试这个(特别是 xpath
):
from lxml import html
sample = """<div class="is-add-item-saving" data-bind="visible: isBusy" style="display: none;"></div>
<span class="aria-live" aria-role="status" aria-live="polite" data-bind="{ text: ariaLiveText }"></span>
<button data-bind="click: addToBag, css : buttonCss, attr: { 'aria-label': resources.pdp_cta_add_to_bag, disabled: isBusy }, markAndMeasure: 'pdp:add_to_bag_interactive'" data-test-id="add-button" aria-label="Add to bag">
<span class="product-tick" data-bind="visible: showProductTick" style="display: none;"></span>
<span data-bind="text: buttonText">Add to bag</span>
</button>"""
print(html.fromstring(sample).xpath("//*[@data-bind='text: buttonText']/text()"))
输出:
['Add to bag']
id
Internal ID used by selenium.
This is mainly for internal use. Simple use cases such as checking if
2 webelements refer to the same element, can be done using ==:
if element1 == element2:
print("These 2 are equal")
使用value.get_attribute("id")
获取id
获取文本使用:
value.text
如果失败使用:
value.get_attribute("textContent")
as value.text 仅检索显示在 UI
中的文本
您也可以为此使用 BeautifulSoup
:
from bs4 import BeautifulSoup
html = """<div class="is-add-item-saving" data-bind="visible: isBusy" style="display: none;"></div>
<span class="aria-live" aria-role="status" aria-live="polite" data-bind="{ text: ariaLiveText }"></span>
<button data-bind="click: addToBag, css : buttonCss, attr: { 'aria-label': resources.pdp_cta_add_to_bag, disabled: isBusy }, markAndMeasure: 'pdp:add_to_bag_interactive'" data-test-id="add-button" aria-label="Add to bag">
<span class="product-tick" data-bind="visible: showProductTick" style="display: none;"></span>
<span data-bind="text: buttonText">Add to bag</span>
</button>"""
soup = BeautifulSoup(html)
tag = soup.find('span',{'data-bind':'text: buttonText'})
print(tag.text)
输出
Add to bag
要打印 text Add to bag
您可以使用以下任一方法 :
使用css_selector
和get_attribute("innerHTML")
:
print(driver.find_element_by_css_selector("button[data-test-id='add-button'][aria-label='Add to bag'] span").get_attribute("innerHTML"))
使用 xpath
和 text 属性:
print(driver.find_element_by_xpath("//button[@data-test-id='add-button' and @aria-label='Add to bag']//span").text)
理想情况下,您需要为 visibility_of_element_located()
引入 ,您可以使用以下任一方法 :
使用 CSS_SELECTOR
和 text 属性:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button[data-test-id='add-button'][aria-label='Add to bag'] span"))).text)
使用 XPATH
和 get_attribute()
:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//button[@data-test-id='add-button' and @aria-label='Add to bag']//span"))).get_attribute("innerHTML"))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
参考资料
Link 到有用的文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性returnsThe text of the element.
- Difference between text and innerHTML using Selenium
如果您很难找到合适的元素,
简单的方法是不查找与 xpath 关联的所有元素,
您必须使用单个标签的 完整 xpath,然后使用 .text
获取它的文本
示例:
text = driver.find_element_by_xpath("full xpath of the element").text
这是我尝试使用 Python Selenium 进行网络抓取的 html
片段。
我正在尝试获取 span data-bind
中的文本 Add to bag
。
<div class="is-add-item-saving" data-bind="visible: isBusy" style="display: none;"></div>
<span class="aria-live" aria-role="status" aria-live="polite" data-bind="{ text: ariaLiveText }"></span>
<button data-bind="click: addToBag, css : buttonCss, attr: { 'aria-label': resources.pdp_cta_add_to_bag, disabled: isBusy }, markAndMeasure: 'pdp:add_to_bag_interactive'" data-test-id="add-button" aria-label="Add to bag">
<span class="product-tick" data-bind="visible: showProductTick" style="display: none;"></span>
<span data-bind="text: buttonText">Add to bag</span>
</button>
这是我目前尝试过的方法。
instock_element = driver.find_elements_by_xpath("//span[contains(@data-bind,'text: buttonText')]")
instock_element = driver.find_elements_by_xpath("//*[contains(text(), 'Add to bag')]")
当我遍历这些 instock_elements
,
for value in instock_element:
print("text : ",value.text)
print(" id : ",value.id)
if len(value.text) == 0:
text = value.id
else:
print(value.text)
text = value.text
ins_list.append(text)
这些给我随机值 6489355d-9dd3-4d77-a0d7-b134ce48fae7
但不是文本 Add to bag
.
试试这个(特别是 xpath
):
from lxml import html
sample = """<div class="is-add-item-saving" data-bind="visible: isBusy" style="display: none;"></div>
<span class="aria-live" aria-role="status" aria-live="polite" data-bind="{ text: ariaLiveText }"></span>
<button data-bind="click: addToBag, css : buttonCss, attr: { 'aria-label': resources.pdp_cta_add_to_bag, disabled: isBusy }, markAndMeasure: 'pdp:add_to_bag_interactive'" data-test-id="add-button" aria-label="Add to bag">
<span class="product-tick" data-bind="visible: showProductTick" style="display: none;"></span>
<span data-bind="text: buttonText">Add to bag</span>
</button>"""
print(html.fromstring(sample).xpath("//*[@data-bind='text: buttonText']/text()"))
输出:
['Add to bag']
id
Internal ID used by selenium.
This is mainly for internal use. Simple use cases such as checking if 2 webelements refer to the same element, can be done using ==:
if element1 == element2: print("These 2 are equal")
使用value.get_attribute("id")
获取id
获取文本使用:
value.text
如果失败使用:
value.get_attribute("textContent")
as value.text 仅检索显示在 UI
中的文本您也可以为此使用 BeautifulSoup
:
from bs4 import BeautifulSoup
html = """<div class="is-add-item-saving" data-bind="visible: isBusy" style="display: none;"></div>
<span class="aria-live" aria-role="status" aria-live="polite" data-bind="{ text: ariaLiveText }"></span>
<button data-bind="click: addToBag, css : buttonCss, attr: { 'aria-label': resources.pdp_cta_add_to_bag, disabled: isBusy }, markAndMeasure: 'pdp:add_to_bag_interactive'" data-test-id="add-button" aria-label="Add to bag">
<span class="product-tick" data-bind="visible: showProductTick" style="display: none;"></span>
<span data-bind="text: buttonText">Add to bag</span>
</button>"""
soup = BeautifulSoup(html)
tag = soup.find('span',{'data-bind':'text: buttonText'})
print(tag.text)
输出
Add to bag
要打印 text Add to bag
您可以使用以下任一方法
使用
css_selector
和get_attribute("innerHTML")
:print(driver.find_element_by_css_selector("button[data-test-id='add-button'][aria-label='Add to bag'] span").get_attribute("innerHTML"))
使用
xpath
和 text 属性:print(driver.find_element_by_xpath("//button[@data-test-id='add-button' and @aria-label='Add to bag']//span").text)
理想情况下,您需要为 visibility_of_element_located()
引入
使用
CSS_SELECTOR
和 text 属性:print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "button[data-test-id='add-button'][aria-label='Add to bag'] span"))).text)
使用
XPATH
和get_attribute()
:print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//button[@data-test-id='add-button' and @aria-label='Add to bag']//span"))).get_attribute("innerHTML"))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
参考资料
Link 到有用的文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性returnsThe text of the element.
- Difference between text and innerHTML using Selenium
如果您很难找到合适的元素, 简单的方法是不查找与 xpath 关联的所有元素, 您必须使用单个标签的 完整 xpath,然后使用 .text
获取它的文本示例:
text = driver.find_element_by_xpath("full xpath of the element").text