Selenium Web Scraping:按文本查找元素在脚本中不起作用
Selenium Web Scraping: Find element by text not working in script
我正在编写一个脚本来从 Newegg 收集信息,以查看显卡价格随时间的变化。目前,我的脚本将通过 Chromedriver 在 RTX 3080 上打开 Newegg 搜索,然后单击桌面显卡的 link 以缩小我的搜索范围。我正在努力解决的部分是开发一个 for item in range 循环,它可以让我遍历所有 8 个搜索结果页面。我知道我可以通过简单地更改 URL 中的页码来做到这一点,但由于这是我试图用来更好地学习 Relative Xpath 的练习,我想使用分页按钮来做到这一点页面底部。我知道每个按钮都应该包含“1、2、3、4 等”的内部文本。但是每当我在 for 循环中使用 text() = {item} 时,它都不会单击按钮。该脚本运行并且没有 return 任何异常,但也没有执行我想要的操作。下面我附上了页面的 HTML 以及我当前的脚本。任何建议或提示表示赞赏。
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
import time
options = Options()
PATH = 'C://Program Files (x86)//chromedriver.exe'
driver = webdriver.Chrome(PATH)
url = 'https://www.newegg.com/p/pl?d=RTX+3080'
driver.maximize_window()
driver.get(url)
card_path = '/html/body/div[8]/div[3]/section/div/div/div[1]/div/dl[1]/dd/ul[2]/li/a'
desktop_graphics_cards = driver.find_element(By.XPATH, card_path)
desktop_graphics_cards.click()
time.sleep(5)
graphics_card = []
shipping_cost = []
price = []
total_cost = []
for item in range(9):
try:
#next_page_click = driver.find_element(By.XPATH("//button[text() = '{item + 1}']"))
print(next_page_click)
next_page_click.click()
except:
pass
分页按钮超出了最初可见的区域。
要单击这些元素,您必须滚动页面直到元素出现。
此外,当您尝试使用从 1 到 9 的数字时,您将需要单击从 2 到 9(包括)的下一页按钮。
我认为这应该更好用:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
import time
options = Options()
PATH = 'C://Program Files (x86)//chromedriver.exe'
driver = webdriver.Chrome(PATH)
url = 'https://www.newegg.com/p/pl?d=RTX+3080'
actions = ActionChains(driver)
driver.maximize_window()
driver.get(url)
card_path = '/html/body/div[8]/div[3]/section/div/div/div[1]/div/dl[1]/dd/ul[2]/li/a'
desktop_graphics_cards = driver.find_element(By.XPATH, card_path)
desktop_graphics_cards.click()
time.sleep(5)
graphics_card = []
shipping_cost = []
price = []
total_cost = []
for item in range(2,10):
try:
next_page_click = driver.find_element(By.XPATH(f"//button[text() = '{item}']"))
actions.move_to_element(next_page_click).perform()
time.sleep(2)
#print(next_page_click) - printing a web element itself will not give you usable information
next_page_click.click()
#let the next page loaded, it takes some time
time.sleep(5)
except:
pass
我正在编写一个脚本来从 Newegg 收集信息,以查看显卡价格随时间的变化。目前,我的脚本将通过 Chromedriver 在 RTX 3080 上打开 Newegg 搜索,然后单击桌面显卡的 link 以缩小我的搜索范围。我正在努力解决的部分是开发一个 for item in range 循环,它可以让我遍历所有 8 个搜索结果页面。我知道我可以通过简单地更改 URL 中的页码来做到这一点,但由于这是我试图用来更好地学习 Relative Xpath 的练习,我想使用分页按钮来做到这一点页面底部。我知道每个按钮都应该包含“1、2、3、4 等”的内部文本。但是每当我在 for 循环中使用 text() = {item} 时,它都不会单击按钮。该脚本运行并且没有 return 任何异常,但也没有执行我想要的操作。下面我附上了页面的 HTML 以及我当前的脚本。任何建议或提示表示赞赏。
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
import time
options = Options()
PATH = 'C://Program Files (x86)//chromedriver.exe'
driver = webdriver.Chrome(PATH)
url = 'https://www.newegg.com/p/pl?d=RTX+3080'
driver.maximize_window()
driver.get(url)
card_path = '/html/body/div[8]/div[3]/section/div/div/div[1]/div/dl[1]/dd/ul[2]/li/a'
desktop_graphics_cards = driver.find_element(By.XPATH, card_path)
desktop_graphics_cards.click()
time.sleep(5)
graphics_card = []
shipping_cost = []
price = []
total_cost = []
for item in range(9):
try:
#next_page_click = driver.find_element(By.XPATH("//button[text() = '{item + 1}']"))
print(next_page_click)
next_page_click.click()
except:
pass
分页按钮超出了最初可见的区域。
要单击这些元素,您必须滚动页面直到元素出现。
此外,当您尝试使用从 1 到 9 的数字时,您将需要单击从 2 到 9(包括)的下一页按钮。
我认为这应该更好用:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import NoSuchElementException
import pandas as pd
import time
options = Options()
PATH = 'C://Program Files (x86)//chromedriver.exe'
driver = webdriver.Chrome(PATH)
url = 'https://www.newegg.com/p/pl?d=RTX+3080'
actions = ActionChains(driver)
driver.maximize_window()
driver.get(url)
card_path = '/html/body/div[8]/div[3]/section/div/div/div[1]/div/dl[1]/dd/ul[2]/li/a'
desktop_graphics_cards = driver.find_element(By.XPATH, card_path)
desktop_graphics_cards.click()
time.sleep(5)
graphics_card = []
shipping_cost = []
price = []
total_cost = []
for item in range(2,10):
try:
next_page_click = driver.find_element(By.XPATH(f"//button[text() = '{item}']"))
actions.move_to_element(next_page_click).perform()
time.sleep(2)
#print(next_page_click) - printing a web element itself will not give you usable information
next_page_click.click()
#let the next page loaded, it takes some time
time.sleep(5)
except:
pass