AttributeError: 'NoneType' object has no attribute 'css'. Trying to scrape old reddit but geting this error
AttributeError: 'NoneType' object has no attribute 'css'. Trying to scrape old reddit but geting this error
我正在尝试抓取 old reddit 但每次出现此错误时:
>>> response.css('div')
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'css'
我是不是做错了什么,或者你不能抓取旧的 reddit 吗?
这是日志:
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://old.reddit.com/robots.txt> (referer: None)
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://old.reddit.com/> from <GET http://old.reddit.com>
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://old.reddit.com/>
这是我的 scrapy shell 输出,希望对您有所帮助。
(scrapy_env) rana@rana-desktop:~/Documents/allproject/scrapy_projt/tutorial$
$ scrapy shell https://old.reddit.com/
In [2]: response.status
Out[2]: 200
In [3]: response.css('div')
Out[3]:
[<Selector xpath='descendant-or-self::div' data='<div class="GoogleAd HomeAds InArticl...'>,
<Selector xpath='descendant-or-self::div' data='<div id="header" role="banner"><a tab...'>,
<Selector xpath='descendant-or-self::div' data='<div id="sr-header-area"><div class="...'>,
<Selector xpath='descendant-or-self::div' data='<div class="width-clip"><div class="d...'>,
<Selector xpath='descendant-or-self::div' data='<div class="dropdown srdrop" onclick=...'>,
<Selector xpath='descendant-or-self::div' data='<div class="drop-choices srdrop"><a h...'>,
<Selector xpath='descendant-or-self::div' data='<div class="sr-list"><ul class="flat-...'>,
<Selector xpath='descendant-or-self::div' data='<div id="header-bottom-left"><a href=...'>,
<Selector xpath='descendant-or-self::div' data='<div id="header-bottom-right"><span c...'>,
<Selector xpath='descendant-or-self::div' data='<div class="side"><div class="spacer"...'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><form action="htt...'>,
<Selector xpath='descendant-or-self::div' data='<div id="searchexpando" class="infoba...'>,
<Selector xpath='descendant-or-self::div' data='<div id="moresearchinfo"><p>use the f...'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><form method="pos...'>,
<Selector xpath='descendant-or-self::div' data='<div class="g-recaptcha" data-sitekey...'>,
<Selector xpath='descendant-or-self::div' data='<div class="status"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div id="remember-me"><input type="ch...'>,
<Selector xpath='descendant-or-self::div' data='<div class="submit"><span class="thro...'>,
<Selector xpath='descendant-or-self::div' data='<div class="clear"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><div class="sideb...'>,
<Selector xpath='descendant-or-self::div' data='<div class="sidebox submit submit-lin...'>,
<Selector xpath='descendant-or-self::div' data='<div class="morelink"><a href="https:...'>,
<Selector xpath='descendant-or-self::div' data='<div class="nub"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><div class="sideb...'>,
<Selector xpath='descendant-or-self::div' data='<div class="sidebox submit submit-tex...'>,
<Selector xpath='descendant-or-self::div' data='<div class="morelink"><a href="https:...'>,
<Selector xpath='descendant-or-self::div' data='<div class="nub"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><a href="/premium...'>,
<Selector xpath='descendant-or-self::div' data='<div class="premium-banner__logo"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="premium-banner__title">Ge...'>,
<Selector xpath='descendant-or-self::div' data='<div class="content" role="main"><sec...'>,
<Selector xpath='descendant-or-self::div' data='<div class="listingsignupbar__cta-con...'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><style type="text...'>,
<Selector xpath='descendant-or-self::div' data='<div class="happening-now-wrap"><div ...'>,
<Selector xpath='descendant-or-self::div' data='<div class="happening-now"><div><p cl...'>,
<Selector xpath='descendant-or-self::div' data='<div><p class="icon"><img src="//www....'>,
<Selector xpath='descendant-or-self::div' data='<div class="close-button">x</div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><style>body >.con...'>,
<Selector xpath='descendant-or-self::div' data='<div id="siteTable" class="sitetable ...'>,
<Selector xpath='descendant-or-self::div' data='<div class=" thing id-t3_jmlqpj odd ...'>,
<Selector xpath='descendant-or-self::div' data='<div class="midcol unvoted"><div clas...'>,
<Selector xpath='descendant-or-self::div' data='<div class="arrow up login-required a...'>,
<Selector xpath='descendant-or-self::div' data='<div class="clearleft"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="nav-buttons"><span class=...'>,
<Selector xpath='descendant-or-self::div' data='<div class="footer-parent"><div by-ze...'>,
<Selector xpath='descendant-or-self::div' data='<div by-zero class="footer rounded"><...'>,
<Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
<Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
<Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
<Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>]
您收到此错误是因为您收到了空响应 (None
)。所以你试图在一个空变量中调用 .css()
方法。您收到 None
而不是预期响应对象的原因是因为 您的蜘蛛过滤了请求 .
您可以在执行日志的这一行中看到:
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://old.reddit.com/>
站点 robots.txt
不允许请求的 URL。您可以通过在行 ROBOTSTXT_OBEY
中更改蜘蛛的 settings.py
来禁用此过滤器。要禁用它,请使用:
ROBOTSTXT_OBEY = False
这将导致您的蜘蛛忽略所有请求的 robots.txt
。 (Read more)
然而,遵守 robots.txt
规则被认为是网络抓取中的一种良好做法(甚至可以说是道德的)。有关 robots.txt
标准 here.
的更多详细信息
我正在尝试抓取 old reddit 但每次出现此错误时:
>>> response.css('div')
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute 'css'
我是不是做错了什么,或者你不能抓取旧的 reddit 吗?
这是日志:
[scrapy.core.engine] DEBUG: Crawled (200) <GET https://old.reddit.com/robots.txt> (referer: None)
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (301) to <GET https://old.reddit.com/> from <GET http://old.reddit.com>
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://old.reddit.com/>
这是我的 scrapy shell 输出,希望对您有所帮助。
(scrapy_env) rana@rana-desktop:~/Documents/allproject/scrapy_projt/tutorial$
$ scrapy shell https://old.reddit.com/
In [2]: response.status
Out[2]: 200
In [3]: response.css('div')
Out[3]:
[<Selector xpath='descendant-or-self::div' data='<div class="GoogleAd HomeAds InArticl...'>,
<Selector xpath='descendant-or-self::div' data='<div id="header" role="banner"><a tab...'>,
<Selector xpath='descendant-or-self::div' data='<div id="sr-header-area"><div class="...'>,
<Selector xpath='descendant-or-self::div' data='<div class="width-clip"><div class="d...'>,
<Selector xpath='descendant-or-self::div' data='<div class="dropdown srdrop" onclick=...'>,
<Selector xpath='descendant-or-self::div' data='<div class="drop-choices srdrop"><a h...'>,
<Selector xpath='descendant-or-self::div' data='<div class="sr-list"><ul class="flat-...'>,
<Selector xpath='descendant-or-self::div' data='<div id="header-bottom-left"><a href=...'>,
<Selector xpath='descendant-or-self::div' data='<div id="header-bottom-right"><span c...'>,
<Selector xpath='descendant-or-self::div' data='<div class="side"><div class="spacer"...'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><form action="htt...'>,
<Selector xpath='descendant-or-self::div' data='<div id="searchexpando" class="infoba...'>,
<Selector xpath='descendant-or-self::div' data='<div id="moresearchinfo"><p>use the f...'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><form method="pos...'>,
<Selector xpath='descendant-or-self::div' data='<div class="g-recaptcha" data-sitekey...'>,
<Selector xpath='descendant-or-self::div' data='<div class="status"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div id="remember-me"><input type="ch...'>,
<Selector xpath='descendant-or-self::div' data='<div class="submit"><span class="thro...'>,
<Selector xpath='descendant-or-self::div' data='<div class="clear"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><div class="sideb...'>,
<Selector xpath='descendant-or-self::div' data='<div class="sidebox submit submit-lin...'>,
<Selector xpath='descendant-or-self::div' data='<div class="morelink"><a href="https:...'>,
<Selector xpath='descendant-or-self::div' data='<div class="nub"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><div class="sideb...'>,
<Selector xpath='descendant-or-self::div' data='<div class="sidebox submit submit-tex...'>,
<Selector xpath='descendant-or-self::div' data='<div class="morelink"><a href="https:...'>,
<Selector xpath='descendant-or-self::div' data='<div class="nub"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><a href="/premium...'>,
<Selector xpath='descendant-or-self::div' data='<div class="premium-banner__logo"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="premium-banner__title">Ge...'>,
<Selector xpath='descendant-or-self::div' data='<div class="content" role="main"><sec...'>,
<Selector xpath='descendant-or-self::div' data='<div class="listingsignupbar__cta-con...'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><style type="text...'>,
<Selector xpath='descendant-or-self::div' data='<div class="happening-now-wrap"><div ...'>,
<Selector xpath='descendant-or-self::div' data='<div class="happening-now"><div><p cl...'>,
<Selector xpath='descendant-or-self::div' data='<div><p class="icon"><img src="//www....'>,
<Selector xpath='descendant-or-self::div' data='<div class="close-button">x</div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="spacer"><style>body >.con...'>,
<Selector xpath='descendant-or-self::div' data='<div id="siteTable" class="sitetable ...'>,
<Selector xpath='descendant-or-self::div' data='<div class=" thing id-t3_jmlqpj odd ...'>,
<Selector xpath='descendant-or-self::div' data='<div class="midcol unvoted"><div clas...'>,
<Selector xpath='descendant-or-self::div' data='<div class="arrow up login-required a...'>,
<Selector xpath='descendant-or-self::div' data='<div class="clearleft"></div>'>,
<Selector xpath='descendant-or-self::div' data='<div class="nav-buttons"><span class=...'>,
<Selector xpath='descendant-or-self::div' data='<div class="footer-parent"><div by-ze...'>,
<Selector xpath='descendant-or-self::div' data='<div by-zero class="footer rounded"><...'>,
<Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
<Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
<Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>,
<Selector xpath='descendant-or-self::div' data='<div class="col"><ul class="flat-vert...'>]
您收到此错误是因为您收到了空响应 (None
)。所以你试图在一个空变量中调用 .css()
方法。您收到 None
而不是预期响应对象的原因是因为 您的蜘蛛过滤了请求 .
您可以在执行日志的这一行中看到:
2020-11-02 14:56:09 [scrapy.downloadermiddlewares.robotstxt] DEBUG: Forbidden by robots.txt: <GET https://old.reddit.com/>
站点 robots.txt
不允许请求的 URL。您可以通过在行 ROBOTSTXT_OBEY
中更改蜘蛛的 settings.py
来禁用此过滤器。要禁用它,请使用:
ROBOTSTXT_OBEY = False
这将导致您的蜘蛛忽略所有请求的 robots.txt
。 (Read more)
然而,遵守 robots.txt
规则被认为是网络抓取中的一种良好做法(甚至可以说是道德的)。有关 robots.txt
标准 here.