如何在 selenium 中使用 xpath 获取元素的父元素 (python)
How to get parents of an element with xpath in selenium (python)
我有一个脚本可以在网页上获取我想要的所有图像,然后我必须获取包含 image
.
的 link
我实际上点击了每张图片,选择了当前页面 link 然后我返回并继续工作。这很慢,但我有一个 a
标签“拥抱”我的 image
,我不知道如何检索该标签。使用标签,它可以更容易和更快。我附上 html 代码和我的 python 代码!
HTML代码
<div class="col-xl col-lg col-md-4 col-sm-6 col-6">
<a href="URL I WANT TO GET ">
<article>
<span class="year">2017</span>
<span class="quality">4K</span>
<span class="imdb">6.7</span>
<img width="190" height="279" src="THE IMAGE URL" class="img-full wp-post-image" alt="" loading="lazy"> <h2>TITLE</h2>
</article>
</a></div>
<div class="col-xl col-lg col-md-4 col-sm-6 col-6">
<a href="URL I WANT TO GET 2">
<article>
<span class="year">2019</span>
<span class="quality">4K</span>
<span class="imdb">8.0</span>
<img width="190" height="279" src="THE IMAGE URL 2" class="img-full wp-post-image" alt="" loading="lazy"> <h2>TITLE</h2>
</article>
</a></div>
Python代码
self.driver.get(category_url)
WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div.archivePaging'))) # a div to see if page is loaded
movies_buttons = self.driver.find_elements_by_css_selector('img.img-full.wp-post-image')
print("Getting all the links!")
for movie in movies_buttons:
self.driver.execute_script("arguments[0].scrollIntoView();", movie)
movie.click()
WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CLASS_NAME, 'infoFilmSingle')))
print(self.driver.current_url)
self.driver.back()
WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div.archivePaging')))
请注意,此代码现在不起作用,因为我正在调用旧页面的电影对象,但这不是问题,因为如果我只看到 link,则不需要更改页面等会话不会改变。
根据我的理解你想做的一个例子 - 你想获得父 a
标签 href
例子
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\Program Files\ChromeDriver\chromedriver.exe')
html_content = """
<div class="col-xl col-lg col-md-4 col-sm-6 col-6">
<a href="https://www.link1.de">
<article>
<span class="year">2017</span>
<span class="quality">4K</span>
<span class="imdb">6.7</span>
<img width="190" height="279" src="THE IMAGE URL" class="img-full wp-post-image" alt="" loading="lazy"> <h2>TITLE</h2>
</article>
</a>
</div>
<div class="col-xl col-lg col-md-4 col-sm-6 col-6">
<a href="https://www.link2.de">
<article>
<span class="year">2019</span>
<span class="quality">4K</span>
<span class="imdb">8.0</span>
<img width="190" height="279" src="THE IMAGE URL 2" class="img-full wp-post-image" alt="" loading="lazy"> <h2>TITLE</h2>
</article>
</a>
</div>
"""
driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))
找到 image elements
及其 class
并向上移动带有 ..
的元素结构,在本例中为 /../..
driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))
aTags = driver.find_elements_by_xpath("//img[contains(@class,'img-full wp-post-image')]/../..")
for ele in aTags:
x=ele.get_attribute('href')
print(x)
driver.close()
输出
https://www.link1.de/
https://www.link2.de/
我有一个脚本可以在网页上获取我想要的所有图像,然后我必须获取包含 image
.
我实际上点击了每张图片,选择了当前页面 link 然后我返回并继续工作。这很慢,但我有一个 a
标签“拥抱”我的 image
,我不知道如何检索该标签。使用标签,它可以更容易和更快。我附上 html 代码和我的 python 代码!
HTML代码
<div class="col-xl col-lg col-md-4 col-sm-6 col-6">
<a href="URL I WANT TO GET ">
<article>
<span class="year">2017</span>
<span class="quality">4K</span>
<span class="imdb">6.7</span>
<img width="190" height="279" src="THE IMAGE URL" class="img-full wp-post-image" alt="" loading="lazy"> <h2>TITLE</h2>
</article>
</a></div>
<div class="col-xl col-lg col-md-4 col-sm-6 col-6">
<a href="URL I WANT TO GET 2">
<article>
<span class="year">2019</span>
<span class="quality">4K</span>
<span class="imdb">8.0</span>
<img width="190" height="279" src="THE IMAGE URL 2" class="img-full wp-post-image" alt="" loading="lazy"> <h2>TITLE</h2>
</article>
</a></div>
Python代码
self.driver.get(category_url)
WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div.archivePaging'))) # a div to see if page is loaded
movies_buttons = self.driver.find_elements_by_css_selector('img.img-full.wp-post-image')
print("Getting all the links!")
for movie in movies_buttons:
self.driver.execute_script("arguments[0].scrollIntoView();", movie)
movie.click()
WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CLASS_NAME, 'infoFilmSingle')))
print(self.driver.current_url)
self.driver.back()
WebDriverWait(self.driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR, 'div.archivePaging')))
请注意,此代码现在不起作用,因为我正在调用旧页面的电影对象,但这不是问题,因为如果我只看到 link,则不需要更改页面等会话不会改变。
根据我的理解你想做的一个例子 - 你想获得父 a
标签 href
例子
from selenium import webdriver
driver = webdriver.Chrome(executable_path=r'C:\Program Files\ChromeDriver\chromedriver.exe')
html_content = """
<div class="col-xl col-lg col-md-4 col-sm-6 col-6">
<a href="https://www.link1.de">
<article>
<span class="year">2017</span>
<span class="quality">4K</span>
<span class="imdb">6.7</span>
<img width="190" height="279" src="THE IMAGE URL" class="img-full wp-post-image" alt="" loading="lazy"> <h2>TITLE</h2>
</article>
</a>
</div>
<div class="col-xl col-lg col-md-4 col-sm-6 col-6">
<a href="https://www.link2.de">
<article>
<span class="year">2019</span>
<span class="quality">4K</span>
<span class="imdb">8.0</span>
<img width="190" height="279" src="THE IMAGE URL 2" class="img-full wp-post-image" alt="" loading="lazy"> <h2>TITLE</h2>
</article>
</a>
</div>
"""
driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))
找到 image elements
及其 class
并向上移动带有 ..
的元素结构,在本例中为 /../..
driver.get("data:text/html;charset=utf-8,{html_content}".format(html_content=html_content))
aTags = driver.find_elements_by_xpath("//img[contains(@class,'img-full wp-post-image')]/../..")
for ele in aTags:
x=ele.get_attribute('href')
print(x)
driver.close()
输出
https://www.link1.de/
https://www.link2.de/