无法弄清楚如何从 Python 调用 html5Tidy 3
Can't figure out how to invoke html5Tidy from Python 3
对于Python3.5.
有人可以向我指出一些有关将 html5tidy 与 Python 3 结合使用的文档吗?令我惊讶的是,多次搜索没有 return 任何结果。
在Python 3中,html5tidy.py中的文档指出:
"""
HTML5Tidy
=========
Simple wrapper around html5lib & lxml.etree to "tidy" html in the wild to
well-formed xml/html
Usage
-----
>>> from html5tidy import tidy
>>> tidy('some text')
'<html><head/><body>some text</body></html>'
Dependencies
------------
* [html5lib](http://code.google.com/p/html5lib/)
* [lxml](http://lxml.de/)
好的,所以我有所有的碎片:
>>> import html5lib
>>> dir(html5lib)
['HTMLParser', '__all__', '__builtins__', '__cached__', [and so on]]
>>>
>>> import lxml
>>> dir(lxml)
['__builtins__', '__cached__', '__doc__', '__file__', [and so on]]
但我注意到 dir(tidy) return 只有双下划线结果:
>>> from html5tidy import tidy
>>> dir(tidy)
['__annotations__', '__call__', '__class__', [and so on...]'__subclasshook__']
所以我打开一个包含 HTML 的文件作为 untidiedHTML.
>>> print(untidiedHTML)
<!DOCTYPE html>
<html id="ng-app" lang="en" ng-app="TH" style="" xmlns:ng="http://angularjs.org">
<head ng-controller="DZHeadController">
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
<title ng-bind="service.title">
What the Heck Is OAuth? - DZone Security
</title>
<link href="WhatIsOAuth0200_files/tranquility.css" rel="stylesheet" type="text/css"/>
</head>
<body class="tranquility" >
... and so on...
然后根据 HTML5 整洁的文档,我尝试:
from html5tidy import tidy
tidiedHTML = tidy(untidiedHTML)
产生:
Traceback (most recent call last):
File "[path to my Python source file].py", line 50, in <module>
tidiedHTML = tidy(untidiedHTML)
File "/usr/local/lib/python3.5/dist-packages/html5tidy.py", line 61, in tidy
parts = [parser.parse(src, encoding=encoding, parseMeta=parseMeta, useChardet=useChardet)]
File "/usr/local/lib/python3.5/dist-packages/html5lib/html5parser.py", line 289, in parse
self._parse(stream, False, None, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/html5lib/html5parser.py", line 130, in _parse
self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/html5lib/_tokenizer.py", line 36, in __init__
self.stream = HTMLInputStream(stream, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream
return HTMLUnicodeInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'parseMeta'
我不知道该怎么做。我搜索了说明如何从 Python 3 调用 html5tidy 的文档,但我一无所获...
该库已损坏 and/or 不适用于 python 3.5。我安装 运行 进入与 html5lib.HTMLParser https://github.com/aleray/html5tidy/blob/master/html5tidy.py#L57
相关的错误
贡献者一人,6年未更新包
你的选择是
- 分叉回购,修复问题并提交拉取请求
- 提取您需要的代码并自己制作
- 找另一个图书馆
对于Python3.5.
有人可以向我指出一些有关将 html5tidy 与 Python 3 结合使用的文档吗?令我惊讶的是,多次搜索没有 return 任何结果。
在Python 3中,html5tidy.py中的文档指出:
"""
HTML5Tidy
=========
Simple wrapper around html5lib & lxml.etree to "tidy" html in the wild to
well-formed xml/html
Usage
-----
>>> from html5tidy import tidy
>>> tidy('some text')
'<html><head/><body>some text</body></html>'
Dependencies
------------
* [html5lib](http://code.google.com/p/html5lib/)
* [lxml](http://lxml.de/)
好的,所以我有所有的碎片:
>>> import html5lib
>>> dir(html5lib)
['HTMLParser', '__all__', '__builtins__', '__cached__', [and so on]]
>>>
>>> import lxml
>>> dir(lxml)
['__builtins__', '__cached__', '__doc__', '__file__', [and so on]]
但我注意到 dir(tidy) return 只有双下划线结果:
>>> from html5tidy import tidy
>>> dir(tidy)
['__annotations__', '__call__', '__class__', [and so on...]'__subclasshook__']
所以我打开一个包含 HTML 的文件作为 untidiedHTML.
>>> print(untidiedHTML)
<!DOCTYPE html>
<html id="ng-app" lang="en" ng-app="TH" style="" xmlns:ng="http://angularjs.org">
<head ng-controller="DZHeadController">
<meta content="text/html; charset=utf-8" http-equiv="content-type"/>
<title ng-bind="service.title">
What the Heck Is OAuth? - DZone Security
</title>
<link href="WhatIsOAuth0200_files/tranquility.css" rel="stylesheet" type="text/css"/>
</head>
<body class="tranquility" >
... and so on...
然后根据 HTML5 整洁的文档,我尝试:
from html5tidy import tidy
tidiedHTML = tidy(untidiedHTML)
产生:
Traceback (most recent call last):
File "[path to my Python source file].py", line 50, in <module>
tidiedHTML = tidy(untidiedHTML)
File "/usr/local/lib/python3.5/dist-packages/html5tidy.py", line 61, in tidy
parts = [parser.parse(src, encoding=encoding, parseMeta=parseMeta, useChardet=useChardet)]
File "/usr/local/lib/python3.5/dist-packages/html5lib/html5parser.py", line 289, in parse
self._parse(stream, False, None, *args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/html5lib/html5parser.py", line 130, in _parse
self.tokenizer = _tokenizer.HTMLTokenizer(stream, parser=self, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/html5lib/_tokenizer.py", line 36, in __init__
self.stream = HTMLInputStream(stream, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/html5lib/_inputstream.py", line 149, in HTMLInputStream
return HTMLUnicodeInputStream(source, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'parseMeta'
我不知道该怎么做。我搜索了说明如何从 Python 3 调用 html5tidy 的文档,但我一无所获...
该库已损坏 and/or 不适用于 python 3.5。我安装 运行 进入与 html5lib.HTMLParser https://github.com/aleray/html5tidy/blob/master/html5tidy.py#L57
相关的错误贡献者一人,6年未更新包
你的选择是
- 分叉回购,修复问题并提交拉取请求
- 提取您需要的代码并自己制作
- 找另一个图书馆