Selenium

Question

我正在尝试从一组网页中检索文本，但我想要检索的某些文本未包含在任何标记中。我可以轻松检索其余内容，但在每一页上都有一段文字仅用双引号括起来，没有其他内容。目前我能够找到它所在的元素，但该元素中还有很多其他内容，因此是否可以指定一个 xpath 进入该元素并专门检索用双引号引起来的文本？

编辑：下面是我想要检索的内容，即 h1 标签下方的两行文本。元素中还有更多内容，但没有任何相关性。所以我正在寻找的 xpath 与 "find any unenclosed text within the article-element with class "widget-content" 类似。

<article class="widget-content">
    
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<script src="/Modules/Orchard.jQuery/scripts/jquery-1.9.1.js" type="text/javascript"></script>


    <h1>Placeholder title</h1>
Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text Placeholder text    <br />
    <br />
Placeholder: Another placeholder    <br />
    <br />

Answer 1

应该是这样的：

xpath=//article[contains(@class, 'widget-content')]/article[1]

Answer 2

你的 xpath 应该是这样的：

//article/text()

它将只输出任何 tag.

之外的文本

希望对您有所帮助！

Answer 3

问：所以我正在寻找的 xpath 与 "find any unenclosed text within the article-element with class "widget-content 类似。
这将是：

//article[@class='widget-content']/text()

但这将包含大量空文本节点（仅限空白）以避免它们尝试：

//article[@class='widget-content']/text()[normalize-space() !='']

问：下面是我要检索的内容，即 h1 标签下方的两行文本。

这将是 (/h1/following-sibling::text())，或者全部是：

"//article[@class='widget-content']/h1/following-sibling::text()[normalize-space() !='']"

Selenium - 查找仅用双引号括起来的文本

Selenium - Find text only enclosed by double quotes

java

html

xpath

selenium-chromedriver