Python: 为什么在 scrapy crawlspider 中不打印或不做任何事情？

Question

我是 scrapy 的新手，无法让它做任何事情。最后，我想通过内部链接从网站上抓取所有 html 评论。

现在我只是想抓取内部链接并将它们添加到列表中。

import scrapy
from scrapy.spiders import CrawlSpider, Rule
from scrapy.linkextractors import LinkExtractor

    class comment_spider(CrawlSpider):
        name = 'test'
        allowed_domains = ['https://www.andnowuknow.com/']
        start_urls = ["https://www.andnowuknow.com/"]

        rules = (Rule(LinkExtractor(), callback='parse_start_url', follow=True),)

        def parse_start_url(self, response):
            return self.parse_item(response)

        def parse_item(self, response):
            urls = []
            for link in LinkExtractor(allow=(),).extract_links(response):
                urls.append(link)
                print(urls)

此时我只是想让它打印一些东西，到目前为止我试过的都没有用。

它以退出代码 0 结束，但不会打印，所以我不知道发生了什么。

我错过了什么？

Answer 1

您的消息日志当然应该给我们一些提示，但我看到您的 allowed_domains 有一个 URL 而不是域。你应该这样设置：

allowed_domains = ["andnowuknow.com"]

(See it in the official documentation)

希望对您有所帮助。

Python: 为什么在 scrapy crawlspider 中不打印或不做任何事情？

Python: why is in scrapy crawlspider not printing or doing anything?

python

comments

scrapy