Scrapy 将 URL 标题保存在文本文件中

Question

您好，我有以下 Scrapy 代码，我想保存一个文件中提供的所有 URL 标题，但它只保存最后一个标题（"url3"）。

    from scrapy.spider import BaseSpider
    from scrapy.selector import Selector
    from scrapy.http import HtmlResponse
    from kirt.items import KirtItem 

    class KirtSpider(BaseSpider):

        name = "spider-name"

        allowed_domains = ["url1","url2","url3"]

        start_urls = ["url1","url2","url3"]


    def parse(self,response):

        sel = Selector(response)
        title = str(sel.xpath('//title/text()').extract())

        with open('alltitles.txt','w') as f:
            f.seek(0)
            f.write(title)

Answer 1

问题出在这里，有两种不同的方式：

    with open('alltitles.txt','w') as f:
        f.seek(0)
        f.write(title)

以模式 'w' 打开文件不仅会打开文件，而且如果已经存在同名文件，则会先将其删除。您应该改为使用 'a' 模式打开文件，如果现有文件存在则将新行附加到现有文件。

不过，在此之后，您还可以调用 f.seek(0)，它将文件写指针倒回文件的开头，并使其覆盖当前文件内容。那段代码更像是：

    with open('alltitles.txt','a') as f:
        # write out the title and add a newline.
        f.write(title + "\n")

Scrapy 将 URL 标题保存在文本文件中

Scrapy save URLs titles in text file

python

scrapy