如何使用 BeautifulSoup 和 Python 仅 select 此文本节点?
How to select only this text node using BeautifulSoup and Python?
我有这个 html 结构 :
<div class="foo">
<h3>Title</h3>
<br>Some text I want to retrieve. <br><br> This text too.
<br> (numbers and position of "br" tag indetermined) And this one too.
<div class="subfoo">Some other text I don't want.</div>
</div>
在我的 python 脚本中,我写了 :
exampleSoup = bs4.BeautifulSoup(res.text, "html.parser")
elems = exampleSoup.select('.foo')
print(elems[0].getText())
不出所料,我得到了完整的文本:
Title
Some text I want to retrieve.
Some other text I don't want.
如何只获取 div 中没有标签的字符串,即 :"Some text I want to retrieve. This text too. And this one too." ?
感谢您的帮助。
您可以使用 .next_sibling
获取树中的下一个元素。
例子
>>> soup = BeautifulSoup(html)
>>> print soup.prettify()
<html>
<body>
<div class="foo">
<h3>
Title
</h3>
Some text I want to retrieve.
<div class="subfoo">
Some other text I don't want.
</div>
</div>
</body>
</html>
>>> print soup.find('div', { 'class' : 'foo' } ).h3.next_sibling.strip()
Some text I want to retrieve.
我有这个 html 结构 :
<div class="foo">
<h3>Title</h3>
<br>Some text I want to retrieve. <br><br> This text too.
<br> (numbers and position of "br" tag indetermined) And this one too.
<div class="subfoo">Some other text I don't want.</div>
</div>
在我的 python 脚本中,我写了 :
exampleSoup = bs4.BeautifulSoup(res.text, "html.parser")
elems = exampleSoup.select('.foo')
print(elems[0].getText())
不出所料,我得到了完整的文本:
Title
Some text I want to retrieve.
Some other text I don't want.
如何只获取 div 中没有标签的字符串,即 :"Some text I want to retrieve. This text too. And this one too." ? 感谢您的帮助。
您可以使用 .next_sibling
获取树中的下一个元素。
例子
>>> soup = BeautifulSoup(html)
>>> print soup.prettify()
<html>
<body>
<div class="foo">
<h3>
Title
</h3>
Some text I want to retrieve.
<div class="subfoo">
Some other text I don't want.
</div>
</div>
</body>
</html>
>>> print soup.find('div', { 'class' : 'foo' } ).h3.next_sibling.strip()
Some text I want to retrieve.