如何编写 python 正则表达式以在 html 中查找 guid？

Question

如何在下面的 HTML 部分中找到 guid？

HTML 样本：

<td>xxxxxxx</td>
<td style="display: none">e3aa8247-354b-e311-b6eb-005056b42341</td>
<td>yyyyyy</td>
<td style="display: none">e3aa8247-354b-e311-b6eb-005056b42342</td>
<td>zzzz</td>

Answer 1

使用 HTML 解析器，例如 "beautiful" 和透明 BeautifulSoup 包。

想法是定位 td 元素与 xxxxxxx、yyyyyy 文本并获得以下 td 兄弟的文本值（假设 xxxxxxx 和 yyyyyy 是你事先知道的标签):

from bs4 import BeautifulSoup

data = """
<tr>
    <td>xxxxxxx</td>
    <td style="display: none">e3aa8247-354b-e311-b6eb-005056b42341</td>
    <td>yyyyyy</td>
    <td style="display: none">e3aa8247-354b-e311-b6eb-005056b42342</td>
    <td>zzzz</td>
</tr>
"""

soup = BeautifulSoup(data)

print soup.find("td", text="xxxxxxx").find_next_sibling('td').text

打印：

e3aa8247-354b-e311-b6eb-005056b42341

Answer 2

re.findall("[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}",the_whole_text)

这是有效的，因为 uuid 总是这种格式......一般来说，在解析 html/xml 时你应该使用 html/xml 解析器而不是 re......因为 re 很难使用嵌套

如何编写 python 正则表达式以在 html 中查找 guid？

How to write python regex to find guid in html?

html

python

regex

html-parsing