如何检查BeautifulSoup标签是否是某个标签？

Question

如果我使用beautifulsoup找到某个标签：

styling = paragraphs.find_all('w:rpr')

我看下一个标签。如果它是 <w:t> 标签，我只想使用该标签。如何查看下一个标签是什么类型的标签？

我尝试 element.find_next_sibling().startswith('<w:t') 该元素，但它显示 NoneType object is not callable。我也试过 element.find_next_sibling().find_all('<w:t'>) 但它没有 return 任何东西。

for element in styling:
    next = element.find_next_sibling()
    if(#next is a <w:t> tag):
        ...

我正在使用 beautifulsoup 并希望坚持使用它，如果可能的话不添加 eTree 或其他解析器与 bs4。

Answer 1

使用item.name你可以看到标签的名字。

问题是标签之间的元素 NavigableString 也被视为同级元素，它们给出 None。

您将不得不跳过这些元素，或者您可以获取所有兄弟元素并使用 for 循环查找第一个 <w:t> 并使用 break

退出循环

from bs4 import BeautifulSoup as BS

text = '''<div>
  <w:rpr></w:rpr>
  <w:t>A</w:t>
</div>'''

soup = BS(text, 'html.parser')

all_wrpr = soup.find_all('w:rpr')
for wrpr in all_wrpr:

    next_tag = wrpr.next_sibling
    print('name:', next_tag.name) # None

    next_tag = wrpr.next_sibling.next_sibling
    #next_tag = next_tag.next_sibling
    print('name:', next_tag.name) # w:t
    print('text:', next_tag.text) # A

#name: None
#name: w:t
#text: A

print('---')

all_siblings = wrpr.next_siblings
for item in all_siblings:
    if item.name == 'w:t':
       print('name:', item.name) # w:t
       print('text:', item.text) # A
       break # exit after first <w:t>

#name: w:t
#text: A

编辑： 如果您测试代码 HTML 格式略有不同

text = '''<div>
  <w:rpr></w:rpr><w:t>A</w:t>
</div>'''

那么标签之间将没有 NavigableString，第一种方法将失败，但第二种方法仍然有效。

如何检查BeautifulSoup标签是否是某个标签？

How to check if BeautifulSoup tag is a certain tag?

python

xml

beautifulsoup

wordprocessingml