AttributeError: 'NoneType' object has no attribute 'css'. Trying to scrape old reddit but geting this error

AttributeError: 'NoneType' object has no attribute 'css'. Trying to scrape old reddit but geting this error

我正在尝试抓取 old reddit 但每次出现此错误时:

>>> response.css('div')

Traceback (most recent call last):

File "<console>", line 1, in <module>

AttributeError: 'NoneType' object has no attribute 'css'

我是不是做错了什么,或者你不能抓取旧的 reddit 吗?

这是日志:

[scrapy.core.engine] DEBUG: Crawled (200) <GET https://old.reddit.com/robots.txt> (referer: None)
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://old.reddit.com/> from <GET http://old.reddit.com>
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://old.reddit.com/>

这是我的 scrapy shell 输出,希望对您有所帮助。

(scrapy_env) rana@rana-desktop:~/Documents/allproject/scrapy_projt/tutorial$
$ scrapy shell https://old.reddit.com/

In [2]: response.status
Out[2]: 200

In [3]: response.css('div')
Out[3]: 
[<Selector xpath='descendant-or-self::div' data='<div class="GoogleAd HomeAds InArticl...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="header" role="banner"><a tab...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="sr-header-area"><div class="...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="width-clip"><div class="d...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="dropdown srdrop" onclick=...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="drop-choices srdrop"><a h...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="sr-list"><ul class="flat-...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="header-bottom-left"><a href=...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="header-bottom-right"><span c...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="side"><div class="spacer"...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><form action="htt...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="searchexpando" class="infoba...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="moresearchinfo"><p>use the f...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><form method="pos...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="g-recaptcha" data-sitekey...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="status"></div>'>,
 <Selector xpath='descendant-or-self::div' data='<div id="remember-me"><input type="ch...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="submit"><span class="thro...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="clear"></div>'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><div class="sideb...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="sidebox submit submit-lin...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="morelink"><a href="https:...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="nub"></div>'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><div class="sideb...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="sidebox submit submit-tex...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="morelink"><a href="https:...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="nub"></div>'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><a href="/premium...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="premium-banner__logo"></div>'>,
 <Selector xpath='descendant-or-self::div' data='<div class="premium-banner__title">Ge...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="content" role="main"><sec...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="listingsignupbar__cta-con...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><style type="text...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="happening-now-wrap"><div ...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="happening-now"><div><p cl...'>,
 <Selector xpath='descendant-or-self::div' data='<div><p class="icon"><img src="//www....'>,
 <Selector xpath='descendant-or-self::div' data='<div class="close-button">x</div>'>,
 <Selector xpath='descendant-or-self::div' data='<div class="spacer"><style>body >.con...'>,
 <Selector xpath='descendant-or-self::div' data='<div id="siteTable" class="sitetable ...'>,
 <Selector xpath='descendant-or-self::div' data='<div class=" thing id-t3_jmlqpj odd  ...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="midcol unvoted"><div clas...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="arrow up login-required a...'>,

 <Selector xpath='descendant-or-self::div' data='<div class="clearleft"></div>'>,
 <Selector xpath='descendant-or-self::div' data='<div class="nav-buttons"><span class=...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="footer-parent"><div by-ze...'>,
 <Selector xpath='descendant-or-self::div' data='<div by-zero class="footer rounded"><...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
 <Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>]

您收到此错误是因为您收到了空响应 (None)。所以你试图在一个空变量中调用 .css() 方法。您收到 None 而不是预期响应对象的原因是因为 您的蜘蛛过滤了请求 .

您可以在执行日志的这一行中看到:

2020-11-02 14:56:09 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://old.reddit.com/>

站点 robots.txt 不允许请求的 URL。您可以通过在行 ROBOTSTXT_OBEY 中更改蜘蛛的 settings.py 来禁用此过滤器。要禁用它,请使用:

ROBOTSTXT_OBEY = False

这将导致您的蜘蛛忽略所有请求的 robots.txt (Read more)

然而,遵守 robots.txt 规则被认为是网络抓取中的一种良好做法(甚至可以说是道德的)。有关 robots.txt 标准 here.

的更多详细信息