使用 Python 和 Selenium 通过 xpath 正确选择 Web 元素

Question

我正在尝试在具有这种结构的网络中进行抓取。

<div>
    <div class = “class1” >
    <div class = “class2” >
    <div class = “class3” >
    <div style = “clear: both; ” >
</div>
<div>
    <div class = “class1” >
    <div class = “class2” >
    <div class = “class3” >
    <div style = “clear: both; ” >
</div>    
<div>
    <div class = “class1” >
    <div class = “class2” >
    <div class = “class3” >
    <div style = “clear: both; ” >
</div>

每个部分都有不同的信息。我想在 class1 中搜索一个特定的词，如果这个词存在，那么我打印信息。在此之后我遇到了问题。之后我想获取本节class3中的信息。例如，如果我在第一部分的 class1 有 "this word" 那么我想获得本部分的 class3 信息。

我的代码是这样的：

cs1 = driver.find_elements_by_class_name("class1")
for i in cs1:
    information = i.text
    if "this word" in information:
        print(information)
        infclass3 = i.find_element_by_xpath('//following-sibling::div[@class = "class3"]')
        print(infclass3.text)

问题是这样的：我通过 "this word" 获得了 class1 信息，但是我没有获得本节中关于 class3 的信息。每次它总是在第一部分打印 class3 。例如，如果 "this word" 在第二和第三部分，我会得到这样的结果：

information of class1 - Section 2
information of class3 - Section 1
information of class1 - Section 3
information of class3 - Section 1

那么第1行和第3行的信息是对的。但是在第2行和第4行不是，1。因为是重复2。因为在第1行不是"this word"

感谢您的帮助。

希望你今天过得愉快:)

Answer 1

您的代码存在问题，您试图从 class1 元素的上下文中获取 class3 元素，这意味着它只会查找 [=12] 的子元素=] 元素当前分配给 i... 考虑到这一点，您想要的 class3 元素的选择器是：

infclass3 = i.find_element_by_xpath('../div[@class="class3"]')

Answer 2

谢谢大家的帮助

最后我是这样搞定的：

infclass3 = i.find_element_by_xpath('following-sibling::*[2]')

我得到了 class1 元素，然后用 'following-sibling::*[2]' 我找到了兄弟姐妹和 select 位于对应于 class3 的位置 2 的元素。

感谢关注

使用 Python 和 Selenium 通过 xpath 正确选择 Web 元素

Correctly selecting a web element through xpath with Python and Selenium

python

selenium

xpath

siblings

web-scraping