使用 scrapy 创建 rss

Question

我添加了一个管道，我在 Whosebug 中找到了一个示例项目的答案。它是：

import csv
from craiglist_sample import settings


def write_to_csv(item):
   writer = csv.writer(open(settings.csv_file_path, 'a'), lineterminator='\n')
    writer.writerow([item[key] for key in item.keys()])



class WriteToCsv(object):
    def process_item(self, item, spider):
        write_to_csv(item)
        return item

它可以正确写入 csv 文件。然后我把它改成这个：

import csv
import sys
from craiglist_sample import settings
import datetime
import PyRSS2Gen

def write_to_csv(item):

    rss = PyRSS2Gen.RSS2(
        title = "Andrew's PyRSS2Gen feed",
        link = "http://www.dalkescientific.com/Python/PyRSS2Gen.html",
        description = "The latest news about PyRSS2Gen, a "
                      "Python library for generating RSS2 feeds",

        lastBuildDate = datetime.datetime.now(),

        items = [
           PyRSS2Gen.RSSItem(
             title =str((item['title']),
             link = str((item['link']),
             description = "Dalke Scientific today announced PyRSS2Gen-0.0, "
                           "a library for generating RSS feeds for Python.  ",
             guid = PyRSS2Gen.Guid("http://www.dalkescientific.com/news/"
                              "030906-PyRSS2Gen.html"),
             pubDate = datetime.datetime(2003, 9, 6, 21, 31)),

        ])

    rss.write_xml(open("pyrss2gen.xml", "w"))

class WriteToCsv(object):
    def process_item(self, item, spider):
        write_to_csv(item)
        return item

但问题是它只将最后一个条目写入 xml 文件。我怎样才能解决这个问题？我需要为每个条目添加新行吗？

items.py 是：

class CraiglistSampleItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    title=Field()
    link=Field()

Answer 1

使用a追加，每次使用w都在覆盖，所以你只得到最后一条数据：

rss.write_xml(open("pyrss2gen.xml", "a"))

如果您查看原始代码，您也可以使用 a 而不是 w。

您可能希望在打开文件或至少关闭文件时使用 with。

使用 scrapy 创建 rss

creating rss with using scrapy

python

rss

pipeline

scrapy