如何解析保存为文本的 html 代码？

Question

我有多个包含 HTML 代码的 .txt 文件（复制网页中的 HTML 代码并另存为 .txt）。

我想将这些文件解析为 HTML。是否有任何库具有与 requests+bs4 包类似的功能，并且可以将来自文本文件的输入作为通常的 Web 解析结果来处理？

感谢您的帮助。

Answer 1

您可能正在寻找 Beautiful Soup，它可以很容易地解析和读取来自 HTML 的文本：

Beautiful Soup is a Python library for pulling data out of HTML and XML files. It works with your favorite parser to provide idiomatic ways of navigating, searching, and modifying the parse tree. It commonly saves programmers hours or days of work.

Answer 2

正如许多评论所述，可以将 .txt 文件提供给 BeautifulSoup():

from bs4 import BeautifulSoup

path = 'path/to/file.txt'
with open(path) as f:
    text = f.read()
BeautifulSoup(text, 'lxml')

如何解析保存为文本的 html 代码？

How to parse html code saved as text?

html

python

parsing

text-parsing