Python: 如何将几个 html 文件的内容从 <title> 标签保存到 html link 中？

Question

我有一个 Python 代码可以很好地解析 html 文件中的一些数据。在代码的末尾，我必须按标签保存 html 文件。例如，我有这 3 html 个文件，其中包含 3 个标题标签：

<title>My name is Prince</title>
<title>I love Madonna</title>
<title>Cars and Candies</title>

每一个都必须这样保存：

my-name-is-prince.html
I-love-madonna.html
cars-and-candies.html

所以，我已经有了一些Python的SAVE解决方案，但我不知道如何通过标签保存。

try:
    title = re.search('<title.+/title>', html)[0]
    title_content = re.search('>(.+)<', title)[1]
    except:
    pass


with open("my-words.html", "w") as some_file_handle:
    some_file_handle.write(finalString)

或

with open('page_323.txt', 'w') as f:
    f.write(result.text)

或

with open("somefilename.txt", "w") as some_file_handle:  
    for line in data: 
        some_file_handle.write(line + "\n")

P.S。我有 500 个文件。 Python 代码必须从每个 html 中找到每个标签并将它们中的每一个保存到新的 html.

中

Answer 1

更新

你在找那个吗：

# html = """<title>My name is Prince</title>"""

>>> re.search(r'<title>(?P<title>.+)</title>', html).groups('title')[0] \
      .replace(' ', '-').lower()

'my-name-is-prince'

旧答案 如果您已经从 html 中提取标题，您可以这样做：

title = 'My name is Prince'
filename = f"{title.lower().replace(' ', '-')}.html"

with open(filename, "w") as some_file_handle:
    some_file_handle.write(finalString)

Answer 2

如果您想使用 beautifulsoup，请检查：

soup = soup.encode(formatter=UnsortedAttributes()).decode('utf-8')
new_filename = title.get_text() 
new_filename = new_filename.lower()
words = re.findall(r'\w+', new_filename)
new_filename = '-'.join(words)
new_filename = new_filename + '.html'
    print(new_filename)

在此处查看完整代码：

https://neculaifantanaru.com/en/python-google-translate-beautifulsoup-library-save-title-tag-as-link.html

Python: 如何将几个 html 文件的内容从 <title> 标签保存到 html link 中？

Python: How can I save the content of several html files, into a html link, from <title> tag?

html

python

tags

save-as