如何使用 Selenium 和 Python 从网络元素中提取文本
How to extract the text from the webelements using Selenium and Python
代码试验:
driver.get(url)
cards = driver.find_elements_by_class_name("job-cardstyle__JobCardComponent-sc-1mbmxes-0")
for card in cards:
data = card.get_attribute('text')
print(data)
driver.close()
driver.quit()
“卡片”正在返回 selenium 网络元素,我无法通过 for 循环从中提取文本。
您需要使用 text
属性而不是 get_attribute('text')
,如下所示:
data = card.text
解决方案
要定位visible元素需要引入WebDriverWait for the visibility_of_all_elements_located(),可以使用以下解决方案:
driver.get(url)
cards = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "job-cardstyle__JobCardComponent-sc-1mbmxes-0")))
for card in cards:
data = card.text
print(data)
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
结尾
在一行中你可以使用 如下:
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "job-cardstyle__JobCardComponent-sc-1mbmxes-0")))])
- 检查您的网络元素路径是否正确提及
- 从元素中获取文本
- 打印出来
问题出在这一行
data = card.get_attribute('text')
您可以执行以下操作:
使用.text
for card in cards:
data = card.text
print(data)
使用innerText
for card in cards:
data = card.get_attribute('innerText')
print(data)
此外,根据上面的评论,您应该打印卡片列表长度以更好地调试它。
print(len(cards))
所以里面有没有东西。
这在一定程度上有效:
driver.get("https://www.monster.com/jobs/search?q=Python-Developer&where=Las+Vegas%2C+NV&page=1")
WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@data-test-id = 'svx-job-title']")))
jobs = driver.find_elements(By.XPATH, "//div[contains(@class, 'job-cardstyle__JobCardHeader')]")
all_jobs = [job.text for job in jobs]
print(all_jobs)
WebdriverWait 导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
输出:
['Software engineer III\nRandstad USA\nLas Vegas, NV', 'C\nPython Developer\nconfidential\n - / Per Hour', 'C\nSenior Software Engineer\nCox Communications Inc\nLas Vegas, NV', 'Mission Systems Engineer\nDCS Corporation\nLas Vegas, NV', 'G\nSoftware Engineer - 914\nGCR Technical Staffing\nHenderson, NV', 'Z\nNetSuite Developer\nZone & Company Software Consulting\nLas Vegas, NV', 'IT Project Engineer\nRauland Florida by Ametek, Inc.\nSunrise, NV', 'A\nWeb Developer\nArdor Global', 'Senior Software Engineer – Node\nMeridian Technology Group Inc.']
Process finished with exit code 0
您可以使用 \n
分隔符拆分列表以供进一步使用。此外,似乎此站点会动态加载卡片,即,当您向下滚动时,会加载新卡片,因此您可能不会一次获得所有卡片。
代码试验:
driver.get(url)
cards = driver.find_elements_by_class_name("job-cardstyle__JobCardComponent-sc-1mbmxes-0")
for card in cards:
data = card.get_attribute('text')
print(data)
driver.close()
driver.quit()
“卡片”正在返回 selenium 网络元素,我无法通过 for 循环从中提取文本。
您需要使用 text
属性而不是 get_attribute('text')
,如下所示:
data = card.text
解决方案
要定位visible元素需要引入WebDriverWait for the visibility_of_all_elements_located(),可以使用以下解决方案:
driver.get(url)
cards = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "job-cardstyle__JobCardComponent-sc-1mbmxes-0")))
for card in cards:
data = card.text
print(data)
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
结尾
在一行中你可以使用
print([my_elem.text for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CLASS_NAME, "job-cardstyle__JobCardComponent-sc-1mbmxes-0")))])
- 检查您的网络元素路径是否正确提及
- 从元素中获取文本
- 打印出来
问题出在这一行
data = card.get_attribute('text')
您可以执行以下操作:
使用
.text
for card in cards: data = card.text print(data)
使用
innerText
for card in cards: data = card.get_attribute('innerText') print(data)
此外,根据上面的评论,您应该打印卡片列表长度以更好地调试它。
print(len(cards))
所以里面有没有东西。
这在一定程度上有效:
driver.get("https://www.monster.com/jobs/search?q=Python-Developer&where=Las+Vegas%2C+NV&page=1")
WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//*[@data-test-id = 'svx-job-title']")))
jobs = driver.find_elements(By.XPATH, "//div[contains(@class, 'job-cardstyle__JobCardHeader')]")
all_jobs = [job.text for job in jobs]
print(all_jobs)
WebdriverWait 导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
输出:
['Software engineer III\nRandstad USA\nLas Vegas, NV', 'C\nPython Developer\nconfidential\n - / Per Hour', 'C\nSenior Software Engineer\nCox Communications Inc\nLas Vegas, NV', 'Mission Systems Engineer\nDCS Corporation\nLas Vegas, NV', 'G\nSoftware Engineer - 914\nGCR Technical Staffing\nHenderson, NV', 'Z\nNetSuite Developer\nZone & Company Software Consulting\nLas Vegas, NV', 'IT Project Engineer\nRauland Florida by Ametek, Inc.\nSunrise, NV', 'A\nWeb Developer\nArdor Global', 'Senior Software Engineer – Node\nMeridian Technology Group Inc.']
Process finished with exit code 0
您可以使用 \n
分隔符拆分列表以供进一步使用。此外,似乎此站点会动态加载卡片,即,当您向下滚动时,会加载新卡片,因此您可能不会一次获得所有卡片。