使用 scrapy 创建 rss

creating rss with using scrapy

我添加了一个管道,我在 Whosebug 中找到了一个示例项目的答案。 它是:

import csv
from craiglist_sample import settings


def write_to_csv(item):
   writer = csv.writer(open(settings.csv_file_path, 'a'), lineterminator='\n')
    writer.writerow([item[key] for key in item.keys()])



class WriteToCsv(object):
    def process_item(self, item, spider):
        write_to_csv(item)
        return item

它可以正确写入 csv 文件。然后我把它改成这个:

import csv
import sys
from craiglist_sample import settings
import datetime
import PyRSS2Gen

def write_to_csv(item):

    rss = PyRSS2Gen.RSS2(
        title = "Andrew's PyRSS2Gen feed",
        link = "http://www.dalkescientific.com/Python/PyRSS2Gen.html",
        description = "The latest news about PyRSS2Gen, a "
                      "Python library for generating RSS2 feeds",

        lastBuildDate = datetime.datetime.now(),

        items = [
           PyRSS2Gen.RSSItem(
             title =str((item['title']),
             link = str((item['link']),
             description = "Dalke Scientific today announced PyRSS2Gen-0.0, "
                           "a library for generating RSS feeds for Python.  ",
             guid = PyRSS2Gen.Guid("http://www.dalkescientific.com/news/"
                              "030906-PyRSS2Gen.html"),
             pubDate = datetime.datetime(2003, 9, 6, 21, 31)),

        ])

    rss.write_xml(open("pyrss2gen.xml", "w"))

class WriteToCsv(object):
    def process_item(self, item, spider):
        write_to_csv(item)
        return item

但问题是它只将最后一个条目写入 xml 文件。我怎样才能解决这个问题?我需要为每个条目添加新行吗?

items.py 是:

class CraiglistSampleItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    title=Field()
    link=Field()

使用a追加,每次使用w都在覆盖,所以你只得到最后一条数据:

rss.write_xml(open("pyrss2gen.xml", "a"))

如果您查看原始代码,您也可以使用 a 而不是 w

您可能希望在打开文件或至少关闭文件时使用 with