Python 正则表达式 Beautifulsoup 4 不工作

Question

我想找到所有在 class 名称中具有特定模式的 div 标签，但我的代码无法正常工作。

这是代码片段

soup = BeautifulSoup(html_doc, 'html.parser')

all_findings = soup.findAll('div',attrs={'class':re.compile(r'common text .*')})

其中 html_doc 是带有以下 html

的字符串

<div class="common text sighting_4619012">

  <div class="hide-c">
    <div class="icon location"></div>
    <p class="reason"></p>
    <p class="small">These will not appear</p>
    <span class="button secondary ">wait</span>
  </div>

  <div class="show-c">
  </div>

</div>

但是 all_findings 结果是一个空列表，而它本应找到一项。

在精确匹配的情况下有效

all_findings = soup.findAll('div',attrs={'class':re.compile(r'hide-c')})

我正在使用 bs4。

Answer 1

不要使用正则表达式，而是将要查找的类放入列表中：

all_findings = soup.findAll('div',attrs={'class':['common', 'text']})

示例代码：

from bs4 import BeautifulSoup

html_doc = """<div class="common text sighting_4619012">

  <div class="hide-c">
    <div class="icon location"></div>
    <p class="reason"></p>
    <p class="small">These will not appear</p>
    <span class="button secondary ">wait</span>
  </div>

  <div class="show-c">
  </div>

</div>"""
soup = BeautifulSoup(html_doc, 'html.parser')
all_findings = soup.findAll('div',attrs={'class':['common', 'text']})
print all_findings

这输出：

[<div class="common text sighting_4619012">
<div class="hide-c">
<div class="icon location"></div>
<p class="reason"></p>
<p class="small">These will not appear</p>
<span class="button secondary ">wait</span>
</div>
<div class="show-c">
</div>
</div>]

Answer 2

要扩展@Andy 的答案，您可以列出 class 个名称和已编译的正则表达式：

soup.find_all('div', {'class': ["common", "text", re.compile(r'sighting_\d{5}')]})

请注意，在这种情况下，您将获得具有指定 classes/patterns 之一的 div 元素 - 换句话说，它是 common 或 text或 sighting_ 后跟五位数字。

如果您想让它们与 "and" 连接，一种选择是关闭对 "class" 属性的特殊处理，方法是将文档解析为 "xml":

soup = BeautifulSoup(html_doc, 'xml')
all_findings = soup.find_all('div', class_=re.compile(r'common text sighting_\d{5}'))
print all_findings

Python 正则表达式 Beautifulsoup 4 不工作

Python RegEx with Beautifulsoup 4 not working

python

regex

beautifulsoup

python-3.x