如何使用 ChromeDriver Chrome 和 Selenium 通过 Python 在页面上打印类别链接?
How to print the the category links on a page using ChromeDriver Chrome and Selenium through Python?
使用 Python3 我试图让 Chrome Webdriver 和 Selenium 识别网页上的各种 'Classifieds' 类别 www.jtinsight.com 并从那里打印出类别标题.到目前为止,使用下面的代码我能做的最好的就是让它打印出前两个 - 'All categories' 和 'Cars(Private)'。我已经确定这两个的 html 与其他的不同,并尝试了我在注释掉的代码中列出的许多不同的代码行,但无法识别正确的 tag/class/xpath 等。
任何帮助将不胜感激。
from selenium import webdriver
from selenium.webdriver.common.by import By
# Creating the WebDriver object using the ChromeDriver
driver = webdriver.Chrome()
# Directing the driver to the defined url
driver.get("https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main")
# Locate the categories
# Each code line runs but only returns the first two categories
# categories = driver.find_elements_by_xpath('//div[@class="col-md-3 col-sm-4 col-xs-6"]')
# categories = driver.find_elements_by_xpath('//div[@class="mainCatEntry"]')
# categories = driver.find_elements_by_xpath('//div[@class="Description"]')
# Process ran but finished with exit code 0
# categories = driver.find_elements_by_xpath('//*[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_xpath('//div[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_partial_link_text('//href[@class="divLink"]')
# categories = driver.find_elements_by_tag_name('bindonce')
# categories = driver.find_elements_by_xpath('//div[@class="divLink"]')
# Error before finished running
# categories = driver.find_elements(By.CLASS_NAME, "col-md-3 col-sm-4 col-xs-6 ng-scope")
# categories = driver.find_elements(By.XPATH, '//div bindonce[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_class_name('//div bindonce[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# Print out all categories on current page
num_page_items = len(categories)
print(num_page_items)
for i in range(num_page_items):
print(categories[i].text)
# Clean up (close browser once task is completed.)
driver.close()
这确实是一个时间问题。如果我在收集类别之前添加一个 "sleep(5)",它会找到所有 24 个。有趣的是,当我改用 WebDriverWait 时,它仍然只会拉出 2 个项目。因此,为了强制驱动程序做更多的工作,我扩展了 xpath。以下对我有用:
categories = WebDriverWait(driver, 10).until(
EC.visibility_of_all_elements_located((By.XPATH, '//div[@class="mainCatEntry"]/div[@class="Description"]')))
识别网页上的各种 分类 类别 https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main
并打印类别标题,例如所有类别、汽车(私人) 等,您需要向下滚动 一点并归纳WebDriverWait 为 visibility_of_all_elements_located()
可以使用以下解决方案:
代码块:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument('disable-infobars')
driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main")
driver.execute_script("arguments[0].scrollIntoView(true);",WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='ng-scope' and text()='Classifieds']"))));
print([elem.get_attribute("innerHTML") for elem in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='mainCatEntry']//div[@class='Description']")))])
使用 Python3 我试图让 Chrome Webdriver 和 Selenium 识别网页上的各种 'Classifieds' 类别 www.jtinsight.com 并从那里打印出类别标题.到目前为止,使用下面的代码我能做的最好的就是让它打印出前两个 - 'All categories' 和 'Cars(Private)'。我已经确定这两个的 html 与其他的不同,并尝试了我在注释掉的代码中列出的许多不同的代码行,但无法识别正确的 tag/class/xpath 等。 任何帮助将不胜感激。
from selenium import webdriver
from selenium.webdriver.common.by import By
# Creating the WebDriver object using the ChromeDriver
driver = webdriver.Chrome()
# Directing the driver to the defined url
driver.get("https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main")
# Locate the categories
# Each code line runs but only returns the first two categories
# categories = driver.find_elements_by_xpath('//div[@class="col-md-3 col-sm-4 col-xs-6"]')
# categories = driver.find_elements_by_xpath('//div[@class="mainCatEntry"]')
# categories = driver.find_elements_by_xpath('//div[@class="Description"]')
# Process ran but finished with exit code 0
# categories = driver.find_elements_by_xpath('//*[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_xpath('//div[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_partial_link_text('//href[@class="divLink"]')
# categories = driver.find_elements_by_tag_name('bindonce')
# categories = driver.find_elements_by_xpath('//div[@class="divLink"]')
# Error before finished running
# categories = driver.find_elements(By.CLASS_NAME, "col-md-3 col-sm-4 col-xs-6 ng-scope")
# categories = driver.find_elements(By.XPATH, '//div bindonce[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# categories = driver.find_elements_by_class_name('//div bindonce[@class="col-md-3 col-sm-4 col-xs-6 ng-scope"]')
# Print out all categories on current page
num_page_items = len(categories)
print(num_page_items)
for i in range(num_page_items):
print(categories[i].text)
# Clean up (close browser once task is completed.)
driver.close()
这确实是一个时间问题。如果我在收集类别之前添加一个 "sleep(5)",它会找到所有 24 个。有趣的是,当我改用 WebDriverWait 时,它仍然只会拉出 2 个项目。因此,为了强制驱动程序做更多的工作,我扩展了 xpath。以下对我有用:
categories = WebDriverWait(driver, 10).until(
EC.visibility_of_all_elements_located((By.XPATH, '//div[@class="mainCatEntry"]/div[@class="Description"]')))
识别网页上的各种 分类 类别 https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main
并打印类别标题,例如所有类别、汽车(私人) 等,您需要向下滚动 一点并归纳WebDriverWait 为 visibility_of_all_elements_located()
可以使用以下解决方案:
代码块:
from selenium import webdriver from selenium.webdriver.common.by import By from selenium.webdriver.support.ui import WebDriverWait from selenium.webdriver.support import expected_conditions as EC options = webdriver.ChromeOptions() options.add_argument("start-maximized") options.add_argument('disable-infobars') driver=webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe') driver.get("https://www.jtinsight.com/JTIRA/JTIRA.aspx#!/main") driver.execute_script("arguments[0].scrollIntoView(true);",WebDriverWait(driver, 30).until(EC.visibility_of_element_located((By.XPATH, "//span[@class='ng-scope' and text()='Classifieds']")))); print([elem.get_attribute("innerHTML") for elem in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='mainCatEntry']//div[@class='Description']")))])