我的 scrapy webscraper 只返回最后的报价

Question

我写了这个 scrapy web scraper 来抓取 this website 前 10 页上的所有引号，在运行代码之后，我发现只有其中一些页面的最后引号是return告诉我。我需要有关如何使 scrapy 蜘蛛 return 所有页面上的所有引号的建议。这是我的代码

import scrapy
from google_quotes.items import GoogleQuotesItem

start_urls=['https://www.goodreads.com/quotes']
for number in range(1,11):
    page_append='?page={}'.format(str(number))
    start_urls.append('https://www.goodreads.com/quotes{}'.format(page_append))

class quotes(scrapy.Spider):
    name='goodreads_quotes'
    def start_requests(self):
        urls=start_urls
        for url in urls:
            yield scrapy.Request(url=url,callback=self.parse)
        
    def parse(self,response):
        g_quotes=GoogleQuotesItem()
        quotes=response.css('div .quoteText::text').extract()
        for quote in quotes:
            if len(quote)>10:
                g_quotes['quote']=quote
        return g_quotes

蜘蛛程序按照我的意愿移动了所有页面，但它只 return 最后引用。

Answer 1

正如@flaxon 提到的，您需要生成结果。您还需要注意缩进。

您还在检查 quote 变量（它是一个字符串，而不是列表）是否长于 10，不知道为什么。

def parse(self,response):
    g_quotes = GoogleQuotesItem()
    quotes = response.css('div .quoteText::text').extract()
    for quote in quotes:
        g_quotes['quote'] = quote
        yield g_quotes # Notice the indentation

试试这个然后告诉我。

我的 scrapy webscraper 只返回最后的报价

MY scrapy webscraper is returning only the last quotes

data-mining

scrapy

web-scraping

python-3.x