如何使用 Selenium 和 Python 从特定 class 中提取 href 信息

How to pull the href information from specific class using Selenium and Python

我目前正在使用 python 和 selenium 进行一些网络抓取,但我似乎无法从特定 [= 的锚标记中的 href 中提取 link 信息30=]。作为参考,它来自 zillow(具体来说,这个 url : https://www.zillow.com/homes/for_rent/San-Francisco,-CA_rb/)。

为了 select 列出的锚标记,我尝试了几个不同的选项,但似乎无法 return 我需要的信息 :

links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))
 -- returns 
None

也试过

links = driver.find_elements(By.CLASS_NAME, "list-card-top")
print(links[0].get_attribute('href'))
 -- returns 
None

links = driver.find_elements(By.CLASS_NAME, "list-card-link list-card-link-top-margin")
print(links[0].get_attribute('href'))
 -- returns 
None

最后

links = driver.find_elements(By.CSS_SELECTOR, "list-card-info.a")
print(links[0].get_attribute('href'))

我知道我可以拉出所有的锚标签,但我肯定这里缺少一个步骤来获取嵌套的锚标签值?还是我拉错了class?不确定我哪里出错了?

您可以使用 XPATH 找到 link(标签)并使用 get_attribute('href') 从标签中获取 link。

像这样:

href = driver.find_element(By.XPATH, '//div[@class="list-card-top"]/a').get_attribute('href')
print(href)

另一个例子:

href = driver.find_element(By.XPATH, '//div[@class="list-card-info"]/a').get_attribute('href')
print(href)

如果你想使用By.CLASS_NAME,你可以这样做:

link = driver.find_element(By.CLASS_NAME, "list-card-top")
a = link.find_element(By.TAG_NAME, 'a').get_attribute('href')
print(href)

你的情况:

links = driver.find_elements(By.CLASS_NAME, "list-card-info")
print(links[0].get_attribute('href'))

您正在尝试在具有 class list-card-info 的 div 元素中查找名为 'href' 的属性。我们实际上想从 div.

中的 a 标签中获取 'href'

要打印 href 属性的值,您必须引入 for the visibility_of_all_elements_located() and using you can use either of the following :

  • 使用CSS_SELECTOR:

    driver.get('https://www.zillow.com/san-francisco-ca/rentals/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.62421695117187%2C%22east%22%3A-122.24244204882812%2C%22south%22%3A37.70334422496088%2C%22north%22%3A37.84716973355808%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A11%7D')
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "div[class='list-card-top'] > a[href]")))])
    
  • 在一行中使用 XPATH

    driver.get('https://www.zillow.com/san-francisco-ca/rentals/?searchQueryState=%7B%22pagination%22%3A%7B%7D%2C%22usersSearchTerm%22%3A%22San%20Francisco%2C%20CA%22%2C%22mapBounds%22%3A%7B%22west%22%3A-122.62421695117187%2C%22east%22%3A-122.24244204882812%2C%22south%22%3A37.70334422496088%2C%22north%22%3A37.84716973355808%7D%2C%22regionSelection%22%3A%5B%7B%22regionId%22%3A20330%2C%22regionType%22%3A6%7D%5D%2C%22isMapVisible%22%3Atrue%2C%22filterState%22%3A%7B%22fsba%22%3A%7B%22value%22%3Afalse%7D%2C%22nc%22%3A%7B%22value%22%3Afalse%7D%2C%22fore%22%3A%7B%22value%22%3Afalse%7D%2C%22cmsn%22%3A%7B%22value%22%3Afalse%7D%2C%22fr%22%3A%7B%22value%22%3Atrue%7D%2C%22ah%22%3A%7B%22value%22%3Atrue%7D%7D%2C%22isListVisible%22%3Atrue%2C%22mapZoom%22%3A11%7D')
    print([my_elem.get_attribute("href") for my_elem in WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='list-card-top']/a[@href]")))])
    
  • 控制台输出:

    ['https://www.zillow.com/homedetails/San-Francisco-CA-94134/15166498_zpid/', 'https://www.zillow.com/b/avery-450-san-francisco-ca-BTfktx/', 'https://www.zillow.com/b/solaire-san-francisco-ca-65g7KK/', 'https://www.zillow.com/homedetails/117-Saint-Charles-Ave-San-Francisco-CA-94132/15195262_zpid/', 'https://www.zillow.com/homedetails/433-40th-Ave-San-Francisco-CA-94121/15092586_zpid/', 'https://www.zillow.com/homedetails/123-Carl-St-San-Francisco-CA-94117/2078490576_zpid/', 'https://www.zillow.com/b/fifteen-fifty-san-francisco-ca-BdnYPc/', 'https://www.zillow.com/b/l-seven-san-francisco-ca-9NJtD7/', 'https://www.zillow.com/homedetails/4642-18th-St-San-Francisco-CA-94114/332858409_zpid/']
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC