使用 python 和 Selenium 抓取 HTML 元素中的 innerText?
Using python and Selenium to scrape the innerText within an HTML element?
我写了一个脚本,使用 selenium 和 pyautogui 模块登录并从元素中抓取一个值并打印它,但它打印了两个破折号 --
。
这是包含我要检索的值 417 的 HTML:
<p id="totReqCountVal" class="trailer-0 avenir-regular font-size-4 text-green js-total-requests">417</p>
这是我试过的相关代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
browser.get('website_to_be_scraped')
browser.find_element(By.ID, 'totReqCountVal')
然后我尝试了:
views = browser.find_element(By.ID, 'totReqCountVal')
print(views)
哪个returns:
(session="12e48df447f7df855a1ee596ba609a30", element="1027ec31-8cb8-4758-b4b0-82b85628ed6c")
在一些帮助下,我还尝试了以下方法:
使用 CSS_SELECTOR 和文本属性:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p#totReqCountVal[class$='js-total-requests']"))).text)
Using XPATH and get_attribute("innerHTML"):
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[@id='totReqCountVal' and contains(@class, 'js-total-requests')]"))).get_attribute("innerHTML"))
添加了以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
我已经通过 devtools 检查了定位器策略是否唯一标识元素,检查了 iframe 和影子根。
如何检索 417 值?
views
是 打印时正确打印:
(session="12e48df447f7df855a1ee596ba609a30", element="1027ec31-8cb8-4758-b4b0-82b85628ed6c")
解决方案
要打印文本 417 你需要引入 for the and you can use either of the following :
使用 CSS_SELECTOR 和 text 属性:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p#totReqCountVal[class$='js-total-requests']"))).text)
使用 XPATH 和 get_attribute("innerHTML")
:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[@id='totReqCountVal' and contains(@class, 'js-total-requests')]"))).get_attribute("innerHTML"))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
参考资料
Link 到有用的文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性returnsThe text of the element.
- Difference between text and innerHTML using Selenium
我写了一个脚本,使用 selenium 和 pyautogui 模块登录并从元素中抓取一个值并打印它,但它打印了两个破折号 --
。
这是包含我要检索的值 417 的 HTML:
<p id="totReqCountVal" class="trailer-0 avenir-regular font-size-4 text-green js-total-requests">417</p>
这是我试过的相关代码:
from selenium import webdriver
from selenium.webdriver.common.by import By
browser.get('website_to_be_scraped')
browser.find_element(By.ID, 'totReqCountVal')
然后我尝试了:
views = browser.find_element(By.ID, 'totReqCountVal')
print(views)
哪个returns:
(session="12e48df447f7df855a1ee596ba609a30", element="1027ec31-8cb8-4758-b4b0-82b85628ed6c")
在一些帮助下,我还尝试了以下方法:
使用 CSS_SELECTOR 和文本属性:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p#totReqCountVal[class$='js-total-requests']"))).text)
Using XPATH and get_attribute("innerHTML"):
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[@id='totReqCountVal' and contains(@class, 'js-total-requests')]"))).get_attribute("innerHTML"))
添加了以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
我已经通过 devtools 检查了定位器策略是否唯一标识元素,检查了 iframe 和影子根。
如何检索 417 值?
views
是
(session="12e48df447f7df855a1ee596ba609a30", element="1027ec31-8cb8-4758-b4b0-82b85628ed6c")
解决方案
要打印文本 417 你需要引入
使用 CSS_SELECTOR 和 text 属性:
print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "p#totReqCountVal[class$='js-total-requests']"))).text)
使用 XPATH 和
get_attribute("innerHTML")
:print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//p[@id='totReqCountVal' and contains(@class, 'js-total-requests')]"))).get_attribute("innerHTML"))
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.common.by import By from selenium.webdriver.support import expected_conditions as EC
You can find a relevant discussion in
参考资料
Link 到有用的文档:
get_attribute()
方法Gets the given attribute or property of the element.
text
属性returnsThe text of the element.
- Difference between text and innerHTML using Selenium