在 python 中使用 selenium 捕获网络中的链接

Question

我正在尝试使用 Python 中的 Selenium 捕获网页链接。我的初始代码是：

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import pandas as pd
import time
from tqdm import tqdm
from selenium.common.exceptions import NoSuchElementException
driver.get('https://www.lovecrave.com/shop/')

然后，我使用以下方法识别了网络中的所有产品 (12)：

perso_flist = driver.find_elements_by_xpath("//p[@class='excerpt']")

然后，我想使用以下方法捕获每个产品的链接：

listOflinks = []
for i in perso_flist:
    link_1=i.find_elements_by_xpath(".//a[@href[1]]")
    listOflinks.append(link_1)
print(listOflinks

我的输出如下：

print(listOflinks)  # 12 EMPTY VALUES
[[], [], [], [], [], [], [], [], [], [], [], []]

我的代码有什么问题？我会感谢你的帮助。

Answer 1

我正在对这个 xpath 做一些假设 //p[@class='excerpt'] 如果下面不起作用，请添加元素的 html 示例。

您可以通过进行此更新获得 link 个元素的列表：

perso_flist = driver.find_elements_by_xpath("//li//a[@class='full-link']")

然后使用 element.get_attribute()

遍历列表

listOflinks = []
for i in perso_flist:
    link_1=i.get_attribute("href")
    listOflinks.append(link_1)
print(listOflinks)

Answer 2

基本上你遍历 a 标签并获得属性 href。

hrefs=[x.get_attribute("href") for x in driver.find_elements_by_xpath("//p[@class='excerpt']/following-sibling::a[1]")]
print(hrefs)

或xpath //li/a[@class='full-link']

产出

['https://www.lovecrave.com/products/duet-pro/',
 'https://www.lovecrave.com/products/vesper/',
 'https://www.lovecrave.com/products/wink/',
 'https://www.lovecrave.com/products/duet/',
 'https://www.lovecrave.com/products/duet-flex/',
 'https://www.lovecrave.com/products/flex/',
 'https://www.lovecrave.com/products/pocket-vibe/',
 'https://www.lovecrave.com/products/bullet/',
 'https://www.lovecrave.com/products/cuffs/',
 'https://www.lovecrave.com/shop/gift-card/',
 'https://www.lovecrave.com/shop/leather-case/',
 'https://www.lovecrave.com/shop/vesper-replacement-charger/']

在 python 中使用 selenium 捕获网络中的链接

Using selenium in python for capturing the links in a web

python

selenium

location-href