Scrapy明文错误

Question

我正在使用 Python Scrapy。我想从没有 HTML 标签的网页中提取文本。下面是我的代码（从这个页面得到的想法：How can I get all the plain text from a website with Scrapy?）

sel = Selector(response)
        item = DeletespiderItem()
        item['url'] =  response.url
        description = sel.select("//body").extract()
        tree = lxml.html.fromstring(description)
        item['description'] = tree.text_content().strip()
        yield item

但我收到以下错误

File "C:\Python27\lib\site-packages\lxml\html\__init__.py", line 722, in fromstring
        is_full_html = _looks_like_full_html_unicode(html)
    exceptions.TypeError: expected string or buffer

我的代码出了什么问题。我怎样才能从中获取纯文本？

谁能帮帮我？谢谢，

更新：

Scapy shell 

sel.select("//body").extract()[0].strip()

o/p \r\n \r\n \r\n \r\n \r\n \r\n 聊天\r\n ]

它正在添加额外的 \r\n ?

Answer 1

extract() returns一个列表，使用：

description = sel.select("//body").extract()[0]

Scrapy明文错误

Scrapy plaintext error

python

scrapy

web-scraping