是否可以在 Python 脚本中使用 lynx 解码 HTML？

Question

令html变量为包含网页全部源代码的字符串，例如

html = "<!doctype html>\n<html><head><title>My title</title></head>LOTS OF CHARS HERE</html>"

我想 print 此网页采用人类可读的格式，如果可能的话使用 lynx。我按照

尝试了各种方法

print(subprocess.run(['echo', html, '|', 'lynx', '-stdin', '-dump'], capture_output=True, text=True).stdout)

或

p1 = subprocess.Popen(["echo", html], stdout=subprocess.PIPE)
print(subprocess.run(['lynx', '-stdin', '-dump'], stdin=p1.stdout, capture_output=True, text=True).stdout)

但失败并出现以下错误

OSError: [Errno 7] Argument list too long: 'echo'

知道如何让它发挥作用吗？

Answer 1

不需要echo，使用html作为lynx的input。

print(subprocess.run(['lynx', '-stdin', '-dump'], input=html, capture_output=True, text=True).stdout)

是否可以在 Python 脚本中使用 lynx 解码 HTML？

Is it possible to decode HTML using lynx, in a Python script?

python

subprocess

lynx