如何使用 xpath 收集所有 href?硒 - Python

How to collect all hrefs using xpath? Selenium - Python

在此示例中,我正在尝试从艺术家那里收集所有 (5) 个社交媒体 link。目前,我的输出只是最后一个(第五个)社交媒体link。我正在使用硒,我知道这不是收集这些数据的最佳选择,但这是我目前所知道的。 请注意,我只包含了我的问题的相关代码。预先感谢您的任何 help/insight.

    from cgitb import text
    from os import link
    from selenium import webdriver
    from selenium.webdriver.support.wait import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.chrome.options import Options
    import time
    from random import randint
    import pandas as pd

    chrome_options = webdriver.ChromeOptions()
    chrome_options.add_argument('disable-infobars')
    chrome_options.add_argument('--disable-extensions')
    chrome_options.add_experimental_option('excludeSwitches', ['enable-automation'])
    driver = webdriver.Chrome(chrome_options=chrome_options)




for url in urls:
driver.get(https://soundcloud.com/flux-pavilion)


time.sleep(randint(3,4))


try:
    links = driver.find_elements_by_xpath('//*[@id="content"]/div/div[4]/div[2]/div/article[1]/div[2]/ul/li//a[@href]')
    for elem in links:
        socialmedia = (elem.get_attribute("href"))


except:
        links = "none"

artist = {
    'socialmedia': socialmedia,
    }

print(artist)

问题不在于您的 XPath-expression,而在于您的输出代码的 (non-existent) 列表处理。

您的代码仅输出了生成的 XPath 列表的最后一项。这就是为什么您只收到一个 link(这是最后一个)的问题。

因此将代码的输出部分更改为

[...]

url = driver.get("https://soundcloud.com/flux-pavilion")    
time.sleep(randint(3,4))
artist = []

try:
    links = driver.find_elements_by_xpath('//*[@id="content"]/div/div[4]/div[2]/div/article[1]/div[2]/ul/li//a[@href]')
    for elem in links:
        artist.append(elem.get_attribute("href"))


except:
        links = "none"

for link in artist:
    print(link)

并且输出将包含您想要的所有值(links):

driver = webdriver.Chrome(chrome_options=chrome_options)
https://gate.sc/?url=https%3A%2F%2Ftwitter.com%2FFluxpavilion&token=da4a8d-1-1653430570528
https://gate.sc/?url=https%3A%2F%2Finstagram.com%2FFluxpavilion&token=277ea0-1-1653430570529
https://gate.sc/?url=https%3A%2F%2Ffacebook.com%2FFluxpavilion&token=4c773c-1-1653430570530
https://gate.sc/?url=https%3A%2F%2Fyoutube.com%2FFluxpavilion&token=1353f7-1-1653430570531
https://gate.sc/?url=https%3A%2F%2Fopen.spotify.com%2Fartist%2F7muzHifhMdnfN1xncRLOqk%3Fsi%3DbK9XeoW5RxyMlA-W9uVwPw&token=bc2936-1-1653430570532