python-可读性的使用
Usage of python-readability
(https://github.com/buriy/python-readability)
我在使用这个库时遇到困难,找不到它的任何文档。 (有吗?)
调用help(Document)有一些可用的片段,但还是有问题。
到目前为止我的代码:
from readability.readability import Document
import requests
url = 'http://www.somepage.com'
html = requests.get(url, verify=False).content
readable_article = Document(html, negative_keywords='test_keyword').summary()
with open('test.html', 'w', encoding='utf-8') as test_file:
test_file.write(readable_article)
根据 help(Document) 输出,应该可以使用列表作为 negative_keywords.
的输入
readable_article = Document(html, negative_keywords=['test_keyword1', 'test-keyword2').summary()
给我一堆我不明白的错误:
Traceback (most recent call last): File
"/usr/lib/python3.4/site-packages/readability/readability.py", line
163, in summary
candidates = self.score_paragraphs() File "/usr/lib/python3.4/site-packages/readability/readability.py", line
300, in score_paragraphs
candidates[parent_node] = self.score_node(parent_node) File "/usr/lib/python3.4/site-packages/readability/readability.py", line
360, in score_node
content_score = self.class_weight(elem) File "/usr/lib/python3.4/site-packages/readability/readability.py", line
348, in class_weight
if self.negative_keywords and self.negative_keywords.search(feature): AttributeError: 'list' object
has no attribute 'search' Traceback (most recent call last): File
"/usr/lib/python3.4/site-packages/readability/readability.py", line
163, in summary
candidates = self.score_paragraphs() File "/usr/lib/python3.4/site-packages/readability/readability.py", line
300, in score_paragraphs
candidates[parent_node] = self.score_node(parent_node) File "/usr/lib/python3.4/site-packages/readability/readability.py", line
360, in score_node
content_score = self.class_weight(elem) File "/usr/lib/python3.4/site-packages/readability/readability.py", line
348, in class_weight
if self.negative_keywords and self.negative_keywords.search(feature): AttributeError: 'list' object
has no attribute 'search'
有人可以给我提示错误或如何处理吗?
库代码有错误。如果你看 compile_pattern
:
def compile_pattern(elements):
if not elements:
return None
elif isinstance(elements, (list, tuple)):
return list(elements)
elif isinstance(elements, regexp_type):
return elements
else:
# assume string or string like object
elements = elements.split(',')
return re.compile(u'|'.join([re.escape(x.lower()) for x in elements]), re.U)
如果 elements
不是 None,则它只是 returns 正则表达式,不是列表或元组,也不是正则表达式。
不过,稍后它假定 self.negative_keywords
是一个正则表达式。因此,我建议您以 "test_keyword1,test_keyword2"
的形式将列表作为字符串输入。这将确保 compile_pattern
returns 一个应该修复错误的正则表达式。
(https://github.com/buriy/python-readability)
我在使用这个库时遇到困难,找不到它的任何文档。 (有吗?)
调用help(Document)有一些可用的片段,但还是有问题。
到目前为止我的代码:
from readability.readability import Document
import requests
url = 'http://www.somepage.com'
html = requests.get(url, verify=False).content
readable_article = Document(html, negative_keywords='test_keyword').summary()
with open('test.html', 'w', encoding='utf-8') as test_file:
test_file.write(readable_article)
根据 help(Document) 输出,应该可以使用列表作为 negative_keywords.
的输入readable_article = Document(html, negative_keywords=['test_keyword1', 'test-keyword2').summary()
给我一堆我不明白的错误:
Traceback (most recent call last): File "/usr/lib/python3.4/site-packages/readability/readability.py", line 163, in summary candidates = self.score_paragraphs() File "/usr/lib/python3.4/site-packages/readability/readability.py", line 300, in score_paragraphs candidates[parent_node] = self.score_node(parent_node) File "/usr/lib/python3.4/site-packages/readability/readability.py", line 360, in score_node content_score = self.class_weight(elem) File "/usr/lib/python3.4/site-packages/readability/readability.py", line 348, in class_weight if self.negative_keywords and self.negative_keywords.search(feature): AttributeError: 'list' object has no attribute 'search' Traceback (most recent call last): File "/usr/lib/python3.4/site-packages/readability/readability.py", line 163, in summary candidates = self.score_paragraphs() File "/usr/lib/python3.4/site-packages/readability/readability.py", line 300, in score_paragraphs candidates[parent_node] = self.score_node(parent_node) File "/usr/lib/python3.4/site-packages/readability/readability.py", line 360, in score_node content_score = self.class_weight(elem) File "/usr/lib/python3.4/site-packages/readability/readability.py", line 348, in class_weight if self.negative_keywords and self.negative_keywords.search(feature): AttributeError: 'list' object has no attribute 'search'
有人可以给我提示错误或如何处理吗?
库代码有错误。如果你看 compile_pattern
:
def compile_pattern(elements):
if not elements:
return None
elif isinstance(elements, (list, tuple)):
return list(elements)
elif isinstance(elements, regexp_type):
return elements
else:
# assume string or string like object
elements = elements.split(',')
return re.compile(u'|'.join([re.escape(x.lower()) for x in elements]), re.U)
如果 elements
不是 None,则它只是 returns 正则表达式,不是列表或元组,也不是正则表达式。
不过,稍后它假定 self.negative_keywords
是一个正则表达式。因此,我建议您以 "test_keyword1,test_keyword2"
的形式将列表作为字符串输入。这将确保 compile_pattern
returns 一个应该修复错误的正则表达式。