AttributeError: 'Response' object has no attribute 'body_as_unicode' scrapy for python

Question

我正在处理 scrapy 中的响应并不断收到此消息。

我只给出了发生错误的片段。我正在尝试浏览不同的网页，需要获取该特定网页中的页面数。所以我创建了一个响应对象，我在其中获取下一个按钮的 href，但继续获取 AttributeError: 'Response' object has no attribute 'body_as_unicode'

使用的代码。

from scrapy.spiders import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from scrapingtest.items import ScrapingTestingItem
from collections import OrderedDict
import json
from scrapy.selector.lxmlsel import HtmlXPathSelector
import csv
import scrapy
from scrapy.http import Response

class scrapingtestspider(Spider):
    name = "scrapytesting"
    allowed_domains = ["tripadvisor.in"]
 #   base_uri = ["tripadvisor.in"]

    def start_requests(self):
        site_array=["http://www.tripadvisor.in/Hotel_Review-g3581633-d2290190-Reviews-Corbett_Treetop_Riverview-Marchula_Jim_Corbett_National_Park_Uttarakhand.html"
                    "http://www.tripadvisor.in/Hotel_Review-g297600-d8029162-Reviews-Daman_Casa_Tesoro-Daman_Daman_and_Diu.html",
                    "http://www.tripadvisor.in/Hotel_Review-g304557-d2519662-Reviews-Darjeeling_Khushalaya_Sterling_Holidays_Resort-Darjeeling_West_Bengal.html",
                    "http://www.tripadvisor.in/Hotel_Review-g319724-d3795261-Reviews-Dharamshala_The_Sanctuary_A_Sterling_Holidays_Resort-Dharamsala_Himachal_Pradesh.html",
                    "http://www.tripadvisor.in/Hotel_Review-g1544623-d8029274-Reviews-Dindi_By_The_Godavari-Nalgonda_Andhra_Pradesh.html"]

        for i in range(len(site_array)):
            response = Response(url=site_array[i])
            sites = Selector(response).xpath('//a[contains(text(), "Next")]/@href').extract()
 #           sites = response.selector.xpath('//a[contains(text(), "Next")]/@href').extract()
            for site in sites:
                yield Request(site_array[i],self.parse)

`

Answer 1

在这种情况下，发生错误的行需要一个 TextResponse 对象而不是正常响应。尝试创建 TextResponse 而不是正常的 Response 来解决错误。

缺少的方法已记录 here。

更具体地说，使用 HtmlResponse，因为您的回复将是 HTML 而不是纯文本。 HtmlResponse 是 TextResponse 的子类，因此它继承了缺少的方法。

还有一件事：你在哪里设置你的Response的正文？如果没有任何正文，您的 xpath 查询将 return 什么都没有。就您问题中的示例而言，您只设置了 URL 但没有正文。这就是为什么你的 xpath return 什么都没有。

Answer 2

这并不能真正回答这个问题，但可以用来找出响应对象的问题 returned。我将其添加为答案，以便它可以帮助某人调试他们面临的问题。

我遇到了类似的错误：AttributeError: 'HtmlResponse' object has no attribute 'text' 当我遇到：

scrapy shell 'http://example.com'
>>>response.text

为了找出问题所在，我检查了响应对象中存在的属性 returned 使用：

response.__dict__

但是，__dict__ 不会 return 由于对象的父对象 class 而附加的属性。

我收到的响应对象具有 _body 属性，其中包含该页面的 html。

AttributeError: 'Response' object has no attribute 'body_as_unicode' scrapy for python

AttributeError: 'Response' object has no attribute 'body_as_unicode' scrapy for python

python

httpresponse

scrapy

web-scraping