如何用 包装特定标签中每个单词的首字母？

Question

我正在尝试将 BeautifulSoup 模块与 Python 一起使用来执行以下操作：

在 div for HTML 中，对于每个段落标记，我想为段落中每个单词的第一个字母添加一个粗体标记。例如：

<div class="body">
    <p>The quick brown fox</p>
</div>

上面写着：敏捷的棕狐

然后会变成

<div class="body">
    <p><b>T</b>he <b>q</b>uick <b>b</b>rown <b>f</b>ox</p>
</div>

那将是：The quick brown fox

使用 bs4 我一直无法找到一个很好的解决方案来解决这个问题，并且愿意接受想法。

Answer 1

我不太了解Python如何详细解析HTML，但我可以为您提供一些想法。

要查找  标签，您可以使用 RegEx <p.*?>.*? 或使用 str.find("") 并一直走到 .

要添加  标签，也许这段代码可以工作：

def add_bold(s: str) -> str:
    ret = ""
    isFirstLet = True
    for i in s:
        if isFirstLet:
            ret += "<b>" + i + "</b>"
            isFirstLet = False
        else:
            ret += i
        if i == " ": isFirstLet = True
    return ret

Answer 2

您可以将 replace_with() 与 list comprehension 结合使用 - 从 tag / bs4 对象中提取 text / string，将其处理为文本，然后再处理用新的 bs4 object:

替换标签

soup.p.replace_with(
    BeautifulSoup(
        ' '.join([s.replace(s[0],f'<b>{s[0]}</b>') for s in soup.p.string.split(' ')]),'html.parser'
    )
)

例子

from bs4 import BeautifulSoup
html = '''
<div class="body">
    <p>The quick brown fox</p>
</div>'''
soup = BeautifulSoup(html,'html.parser')

soup.p.replace_with(
    BeautifulSoup(
        ' '.join([s.replace(s[0],f'<b>{s[0]}</b>') for s in soup.p.string.split(' ')]),'html.parser'
    )
)

soup

输出

<div class="body">
<b>T</b>he <b>q</b>uick <b>b</b>rown <b>f</b>ox
</div>

如何用 <b> 包装特定标签中每个单词的首字母？

How to wrap initial of each word in a specific tag with a <b>?

html

python

formatting

beautifulsoup

web-scraping

例子

输出