如何通过 XPath 在 Selenium 中使用 @FindAll 和 @FindBy 进行 Web 抓取

Question

我是用这个方法刷元素的

name = driver.find_elements(By.XPATH, '//div[@class="p-name p-name-type-2"]/a/em/font[3]/font')

但是当我想要内部产品详细信息时，我必须移动到该项目页面（单一产品页面）

然后我只访问该项目数据，但我想废弃所有项目数据。它给出了 1 项数据，但我想要该项目的所有数据。

All The Outer Details of Products (I know How to scrap this) With the arrow. But do not know how to scrap the inner details of all the items that are shown in picture 2 (next link)

I want to scrap these details that are indicated by the red color arrow by xpath

Answer 1

要抓取产品的内部数据，您必须一个接一个地点击它们，然后它会在新标签页中打开，因此您必须切换到一个新标签页然后您应该能够抓取它。

代码：

driver.maximize_window()
wait = WebDriverWait(driver, 20)

driver.get("https://search.jd.com/Search?keyword=两件套套装裙&enc=utf-8&wq=两件套套装裙&pvid=c35452079d6240b3a5fab6c585b53856")

all_products = wait.until(EC.presence_of_all_elements_located((By.XPATH, "//img[@data-img and not(@data-url) and @height='220']")))

print(len(all_products))
i= 1
for product in all_products:
    prd = wait.until(EC.visibility_of_element_located((By.XPATH, f"(//img[@data-img and not(@data-url) and @height='220'])[{i}]")))
    driver.execute_script("arguments[0].scrollIntoView(true);", prd)
    prd.click()
    all_handles = driver.window_handles
    driver.switch_to.window(all_handles[1])
    print(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.sku-name"))).get_attribute('innerText'))
    print(wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, "span.p-price"))).text)
    driver.close()
    driver.switch_to.window(all_handles[0])
    i = i + 1

进口：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

网站响应很慢，所以没能运行整个执行。但是，上面的代码在您所在的地区应该可以正常工作。

此外，Whosebug 不让我 post 输出，因为它包含一些特殊的 chars.Please 请参阅输出的注释。

如何通过 XPath 在 Selenium 中使用 @FindAll 和 @FindBy 进行 Web 抓取

How to use @FindAll and @FindBy in Selenium by XPath for Web Scraping

python

selenium

xpath

web-scraping

selenium-webdriver