使用 XPath 和 Selenium 定位 class 的特定实例
Locating a specific instance of a class located using XPath with Selenium
我尝试使用 Selenium 单击每个元素(屏幕截图 1 中显示的每个容器)的 PDF 图标(屏幕截图 2 中显示)。
问题是 PDF 图标的标识符是有限的,所以我只能通过 class 使用 XPath 表达式定位它们。在 for elem in issues_numb:
语句的每次迭代中,脚本单击它在页面上找到的第一个 PDF 图标,因为它是与提供给脚本的 XPath 关联的第一个元素。
有没有办法创建一个嵌套循环,让 class(文章标题)的每个实例都单击与其关联的另一个 class(PDF 图标)的实例?所以对于第一篇文章,点击第一个PDF图标等...
HTML代码:
<section aria-label="Metadata for Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea" class="article-list-item-content-block ">
<div class="title " data-ember-action="" data-ember-action-1069="1069">
<div id="ember1070" class="ember-view"><a target="_blank" href="/libraries/1374/articles/504204400" id="ember1071" class="ember-view" tabindex="0"> Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea
</a>
</div>
</div>
<!---->
<div class="metadata">
<!---->
<span tabindex="0" class="pages ">
p. 489
</span>
<!---->
<span class="authors" data-ember-action="" data-ember-action-1082="1082">
<span tabindex="0" class="preview tabindex">
Iqbal, Sajid; Vohra, Muhammad Sufyan; Janjua, Hussnain Ahmed
</span>
</span>
<div class="abstract" data-ember-action="" data-ember-action-1083="1083">
<div tabindex="0" class="preview tabindex">
<div id="ember1088" class="ember-view">
<span class="lt-line-clamp__line">In the current study, strain MW-6 isolated from Arabian seawater exhibited broad-spectrum antibacterial activity</span>
<span class="lt-line-clamp__line">against indicator bacterial pathogens. The partially extracted antibacterial metabolites with ethyl acetate revealed</span>
<span class="lt-line-clamp__line lt-line-clamp__line--last">
promising activity against, and. The minimum inhibitory concentrations (MICs) were determined against indicator stra<span class="lt-line-clamp__ellipsis"><div class="lt-line-clamp__dummy-element">…</div>
<!---->
</span></span>
<!----><span class="lt-line-clamp__ellipsis lt-line-clamp__ellipsis--dummy">…</span></div>
</div>
</div>
</div>
<!---->
<div class="content-overflow " data-ember-action="" data-ember-action-1089="1089">
<span class="chevron icon flaticon solid down-2"></span>
</div>
<div class="tools ">
<div class="buttons noselect">
<div class="button invisible download-pdf" data-ember-action="" data-ember-action-1090="1090">
<div id="ember1091" class="ember-view"><a aria-label="Download PDF" target="_blank" href="/libraries/1374/articles/504204400/pdf" id="ember1092" class="tooltip ember-view" tabindex="0">
<span aria-hidden="true" class="icon fal fa-file-pdf"></span>
<span class="aria-hidden">Download PDF - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
</div>
<div class="button invisible read-full-text" data-ember-action="" data-ember-action-1097="1097">
<div id="ember1098" class="ember-view"><a aria-label="Link to Article" target="_blank" href="/libraries/1374/articles/504204400" id="ember1099" class="tooltip ember-view" tabindex="0">
<span aria-hidden="true" class="icon fal fa-link"></span>
<span class="aria-hidden">Link to Article - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
</div>
<div class="button invisible add-to-my-articles" data-ember-action="" data-ember-action-1100="1100">
<a aria-label="Save to My Articles" class="tabindex tooltip" tabindex="0">
<span aria-hidden="true" class="icon fal fa-folder"></span>
<span class="aria-hidden">Save to My Articles - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
<div class="button invisible citation-services" data-ember-action="" data-ember-action-2165="2165">
<a tabindex="0" aria-label="Export Citation" class="tabindex tooltip">
<span aria-hidden="true" class="icon fal fa-graduation-cap"></span>
<span class="aria-hidden">Export Citation - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
<div class="button invisible social-media-services" data-ember-action="" data-ember-action-2166="2166">
<a tabindex="0" aria-label="Share" class="tabindex tooltip">
<span aria-hidden="true" class="icon fal fa-share-alt"></span>
<span class="aria-hidden">Share - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
</div>
</div>
</section>
我的代码:
issues_numb = driver.find_elements(By.XPATH, "//section[@class='article-list-item-content-block ']")
parent_tab = driver.current_window_handle
for elem in issues_numb:
title_article = elem.get_attribute("aria-label")
print(title_article[13:])
try:
check_buttons = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
print("pdf object found for", str(elem))
checking_size_buttons = len(str(check_buttons))
if checking_size_buttons > 0:
pdf_icon = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
click_pdf = ActionChains(driver).move_to_element(pdf_icon).click(pdf_icon).perform()
WebDriverWait(driver, timeout).until(element_present)
check_need_to_sign_in()
driver.switch_to.window(parent_tab)
else:
print("No PDF available")
except NoSuchElementException:
get_article_name()
issues_numb
变量指的是这个元素:
tools_box
变量引用了这个元素:
当您以双斜杠 (//
) 开始 XPath 表达式时,引擎会从根开始查找内容中的所有内容。
因此,您应该通过在 //
前面添加 .
来更改循环内的 XPath 表达式。这样,您就可以告诉引擎使用当前上下文而不是根上下文。
只是给你一个想法,你的代码应该是这样的。
顺便说一句:最好分享实际的 HTML 内容,这样您的代码和问题更容易理解。
issues_numb = driver.find_elements(By.XPATH, "//div[@class='issue ember-view']")
for elem in issues_numb:
button = ActionChains(driver).move_to_element(elem).click(elem).perform()
check_buttons = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
checking_size_buttons = len(str(check_buttons))
if checking_size_buttons > 0:
tools_box = driver.find_elements(By.XPATH, ".//div[@class='buttons noselect']")
for box in tools_box:
element_present = EC.presence_of_element_located((By.XPATH, ".//span[@class='icon fal fa-file-pdf']"))
WebDriverWait(driver, timeout).until(element_present)
pdf_icon = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
parent_tab = driver.current_window_handle
click_pdf = ActionChains(driver).move_to_element(pdf_icon).click(pdf_icon).perform()
time.sleep(10)
print(driver.current_url)
check_need_to_sign_in()
driver.switch_to.window(parent_tab)
解决这种情况的方法,即只能访问由多个元素共享的标识符(在我的例子中是一个由多个 PDF 图标共享的 class 名称),是指定要查看的上下文。
这样,驱动程序将只查找与您搜索的特定区域相关的 HTML 代码。关于此 here. 的更多信息,但从那时起,Selenium 的正确语法发生了变化。这是语法更新版本:
elements = driver.find_elements(By.XPATH, "//tag['targeted_context']")
for elem in elements:
targeted_element = elem.find_element(By.XPATH,".//tag[@class='targeted_class']")
(@AbdulAzizBarkat 在评论中回答。)
我尝试使用 Selenium 单击每个元素(屏幕截图 1 中显示的每个容器)的 PDF 图标(屏幕截图 2 中显示)。
问题是 PDF 图标的标识符是有限的,所以我只能通过 class 使用 XPath 表达式定位它们。在 for elem in issues_numb:
语句的每次迭代中,脚本单击它在页面上找到的第一个 PDF 图标,因为它是与提供给脚本的 XPath 关联的第一个元素。
有没有办法创建一个嵌套循环,让 class(文章标题)的每个实例都单击与其关联的另一个 class(PDF 图标)的实例?所以对于第一篇文章,点击第一个PDF图标等...
HTML代码:
<section aria-label="Metadata for Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea" class="article-list-item-content-block ">
<div class="title " data-ember-action="" data-ember-action-1069="1069">
<div id="ember1070" class="ember-view"><a target="_blank" href="/libraries/1374/articles/504204400" id="ember1071" class="ember-view" tabindex="0"> Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea
</a>
</div>
</div>
<!---->
<div class="metadata">
<!---->
<span tabindex="0" class="pages ">
p. 489
</span>
<!---->
<span class="authors" data-ember-action="" data-ember-action-1082="1082">
<span tabindex="0" class="preview tabindex">
Iqbal, Sajid; Vohra, Muhammad Sufyan; Janjua, Hussnain Ahmed
</span>
</span>
<div class="abstract" data-ember-action="" data-ember-action-1083="1083">
<div tabindex="0" class="preview tabindex">
<div id="ember1088" class="ember-view">
<span class="lt-line-clamp__line">In the current study, strain MW-6 isolated from Arabian seawater exhibited broad-spectrum antibacterial activity</span>
<span class="lt-line-clamp__line">against indicator bacterial pathogens. The partially extracted antibacterial metabolites with ethyl acetate revealed</span>
<span class="lt-line-clamp__line lt-line-clamp__line--last">
promising activity against, and. The minimum inhibitory concentrations (MICs) were determined against indicator stra<span class="lt-line-clamp__ellipsis"><div class="lt-line-clamp__dummy-element">…</div>
<!---->
</span></span>
<!----><span class="lt-line-clamp__ellipsis lt-line-clamp__ellipsis--dummy">…</span></div>
</div>
</div>
</div>
<!---->
<div class="content-overflow " data-ember-action="" data-ember-action-1089="1089">
<span class="chevron icon flaticon solid down-2"></span>
</div>
<div class="tools ">
<div class="buttons noselect">
<div class="button invisible download-pdf" data-ember-action="" data-ember-action-1090="1090">
<div id="ember1091" class="ember-view"><a aria-label="Download PDF" target="_blank" href="/libraries/1374/articles/504204400/pdf" id="ember1092" class="tooltip ember-view" tabindex="0">
<span aria-hidden="true" class="icon fal fa-file-pdf"></span>
<span class="aria-hidden">Download PDF - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
</div>
<div class="button invisible read-full-text" data-ember-action="" data-ember-action-1097="1097">
<div id="ember1098" class="ember-view"><a aria-label="Link to Article" target="_blank" href="/libraries/1374/articles/504204400" id="ember1099" class="tooltip ember-view" tabindex="0">
<span aria-hidden="true" class="icon fal fa-link"></span>
<span class="aria-hidden">Link to Article - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
</div>
<div class="button invisible add-to-my-articles" data-ember-action="" data-ember-action-1100="1100">
<a aria-label="Save to My Articles" class="tabindex tooltip" tabindex="0">
<span aria-hidden="true" class="icon fal fa-folder"></span>
<span class="aria-hidden">Save to My Articles - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
<div class="button invisible citation-services" data-ember-action="" data-ember-action-2165="2165">
<a tabindex="0" aria-label="Export Citation" class="tabindex tooltip">
<span aria-hidden="true" class="icon fal fa-graduation-cap"></span>
<span class="aria-hidden">Export Citation - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
<div class="button invisible social-media-services" data-ember-action="" data-ember-action-2166="2166">
<a tabindex="0" aria-label="Share" class="tabindex tooltip">
<span aria-hidden="true" class="icon fal fa-share-alt"></span>
<span class="aria-hidden">Share - Whole-genome sequence and broad-spectrum antibacterial activity of Chryseobacterium cucumeris strain MW-6 isolated from the Arabian Sea</span>
</a>
</div>
</div>
</div>
</section>
我的代码:
issues_numb = driver.find_elements(By.XPATH, "//section[@class='article-list-item-content-block ']")
parent_tab = driver.current_window_handle
for elem in issues_numb:
title_article = elem.get_attribute("aria-label")
print(title_article[13:])
try:
check_buttons = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
print("pdf object found for", str(elem))
checking_size_buttons = len(str(check_buttons))
if checking_size_buttons > 0:
pdf_icon = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
click_pdf = ActionChains(driver).move_to_element(pdf_icon).click(pdf_icon).perform()
WebDriverWait(driver, timeout).until(element_present)
check_need_to_sign_in()
driver.switch_to.window(parent_tab)
else:
print("No PDF available")
except NoSuchElementException:
get_article_name()
issues_numb
变量指的是这个元素:
tools_box
变量引用了这个元素:
当您以双斜杠 (//
) 开始 XPath 表达式时,引擎会从根开始查找内容中的所有内容。
因此,您应该通过在 //
前面添加 .
来更改循环内的 XPath 表达式。这样,您就可以告诉引擎使用当前上下文而不是根上下文。
只是给你一个想法,你的代码应该是这样的。
顺便说一句:最好分享实际的 HTML 内容,这样您的代码和问题更容易理解。
issues_numb = driver.find_elements(By.XPATH, "//div[@class='issue ember-view']")
for elem in issues_numb:
button = ActionChains(driver).move_to_element(elem).click(elem).perform()
check_buttons = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
checking_size_buttons = len(str(check_buttons))
if checking_size_buttons > 0:
tools_box = driver.find_elements(By.XPATH, ".//div[@class='buttons noselect']")
for box in tools_box:
element_present = EC.presence_of_element_located((By.XPATH, ".//span[@class='icon fal fa-file-pdf']"))
WebDriverWait(driver, timeout).until(element_present)
pdf_icon = driver.find_element(By.XPATH, ".//span[@class='icon fal fa-file-pdf']")
parent_tab = driver.current_window_handle
click_pdf = ActionChains(driver).move_to_element(pdf_icon).click(pdf_icon).perform()
time.sleep(10)
print(driver.current_url)
check_need_to_sign_in()
driver.switch_to.window(parent_tab)
解决这种情况的方法,即只能访问由多个元素共享的标识符(在我的例子中是一个由多个 PDF 图标共享的 class 名称),是指定要查看的上下文。
这样,驱动程序将只查找与您搜索的特定区域相关的 HTML 代码。关于此 here.
elements = driver.find_elements(By.XPATH, "//tag['targeted_context']")
for elem in elements:
targeted_element = elem.find_element(By.XPATH,".//tag[@class='targeted_class']")
(@AbdulAzizBarkat 在评论中回答。)