如何使用 Selenium 和 Python 从网络元素中提取文本

How to extract the text from the webelements using Selenium and Python

代码试验:

driver.get(url)
cards = driver.find_elements_by_class_name("job-cardstyle__JobCardComponent-sc-1mbmxes-0")
for card in cards:
    data = card.get_attribute('text')
    print(data)

    
driver.close()
driver.quit()

“卡片”正在返回 selenium 网络元素,我无法通过 for 循环从中提取文本。

您需要使用 text 属性而不是 get_attribute('text'),如下所示:

data = card.text

解决方案

要定位visible元素需要引入WebDriverWait for the visibility_of_all_elements_located(),可以使用以下解决方案:

driver.get(url)
cards = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "job-cardstyle__JobCardComponent-sc-1mbmxes-0")))
for card in cards:
    data = card.text
    print(data)

注意:您必须添加以下导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

结尾

在一行中你可以使用 如下:

print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "job-cardstyle__JobCardComponent-sc-1mbmxes-0")))])
  1. 检查您的网络元素路径是否正确提及
  2. 从元素中获取文本
  3. 打印出来

问题出在这一行

 data = card.get_attribute('text')

您可以执行以下操作:

  1. 使用.text

    for card in cards:
     data = card.text
     print(data)
    
  2. 使用innerText

    for card in cards:
     data = card.get_attribute('innerText')
     print(data)
    

此外,根据上面的评论,您应该打印卡片列表长度以更好地调试它。

print(len(cards))

所以里面有没有东西。

这在一定程度上有效:

driver.get("https://www.monster.com/jobs/search?q=Python-Developer&where=Las+Vegas%2C+NV&page=1")
WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@data-test-id = 'svx-job-title']")))
jobs = driver.find_elements(By.XPATH, "//div[contains(@class, 'job-cardstyle__JobCardHeader')]")
all_jobs = [job.text for job in jobs]
print(all_jobs)

WebdriverWait 导入:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

输出:

['Software engineer III\nRandstad USA\nLas Vegas, NV', 'C\nPython Developer\nconfidential\n -  / Per Hour', 'C\nSenior Software Engineer\nCox Communications Inc\nLas Vegas, NV', 'Mission Systems Engineer\nDCS Corporation\nLas Vegas, NV', 'G\nSoftware Engineer - 914\nGCR Technical Staffing\nHenderson, NV', 'Z\nNetSuite Developer\nZone & Company Software Consulting\nLas Vegas, NV', 'IT Project Engineer\nRauland Florida by Ametek, Inc.\nSunrise, NV', 'A\nWeb Developer\nArdor Global', 'Senior Software Engineer – Node\nMeridian Technology Group Inc.']

Process finished with exit code 0

您可以使用 \n 分隔符拆分列表以供进一步使用。此外,似乎此站点会动态加载卡片,即,当您向下滚动时,会加载新卡片,因此您可能不会一次获得所有卡片。