使用 scrapy 创建 rss
creating rss with using scrapy
我添加了一个管道,我在 Whosebug 中找到了一个示例项目的答案。
它是:
import csv
from craiglist_sample import settings
def write_to_csv(item):
writer = csv.writer(open(settings.csv_file_path, 'a'), lineterminator='\n')
writer.writerow([item[key] for key in item.keys()])
class WriteToCsv(object):
def process_item(self, item, spider):
write_to_csv(item)
return item
它可以正确写入 csv 文件。然后我把它改成这个:
import csv
import sys
from craiglist_sample import settings
import datetime
import PyRSS2Gen
def write_to_csv(item):
rss = PyRSS2Gen.RSS2(
title = "Andrew's PyRSS2Gen feed",
link = "http://www.dalkescientific.com/Python/PyRSS2Gen.html",
description = "The latest news about PyRSS2Gen, a "
"Python library for generating RSS2 feeds",
lastBuildDate = datetime.datetime.now(),
items = [
PyRSS2Gen.RSSItem(
title =str((item['title']),
link = str((item['link']),
description = "Dalke Scientific today announced PyRSS2Gen-0.0, "
"a library for generating RSS feeds for Python. ",
guid = PyRSS2Gen.Guid("http://www.dalkescientific.com/news/"
"030906-PyRSS2Gen.html"),
pubDate = datetime.datetime(2003, 9, 6, 21, 31)),
])
rss.write_xml(open("pyrss2gen.xml", "w"))
class WriteToCsv(object):
def process_item(self, item, spider):
write_to_csv(item)
return item
但问题是它只将最后一个条目写入 xml 文件。我怎样才能解决这个问题?我需要为每个条目添加新行吗?
items.py 是:
class CraiglistSampleItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
title=Field()
link=Field()
使用a
追加,每次使用w
都在覆盖,所以你只得到最后一条数据:
rss.write_xml(open("pyrss2gen.xml", "a"))
如果您查看原始代码,您也可以使用 a
而不是 w
。
您可能希望在打开文件或至少关闭文件时使用 with
。
我添加了一个管道,我在 Whosebug 中找到了一个示例项目的答案。 它是:
import csv
from craiglist_sample import settings
def write_to_csv(item):
writer = csv.writer(open(settings.csv_file_path, 'a'), lineterminator='\n')
writer.writerow([item[key] for key in item.keys()])
class WriteToCsv(object):
def process_item(self, item, spider):
write_to_csv(item)
return item
它可以正确写入 csv 文件。然后我把它改成这个:
import csv
import sys
from craiglist_sample import settings
import datetime
import PyRSS2Gen
def write_to_csv(item):
rss = PyRSS2Gen.RSS2(
title = "Andrew's PyRSS2Gen feed",
link = "http://www.dalkescientific.com/Python/PyRSS2Gen.html",
description = "The latest news about PyRSS2Gen, a "
"Python library for generating RSS2 feeds",
lastBuildDate = datetime.datetime.now(),
items = [
PyRSS2Gen.RSSItem(
title =str((item['title']),
link = str((item['link']),
description = "Dalke Scientific today announced PyRSS2Gen-0.0, "
"a library for generating RSS feeds for Python. ",
guid = PyRSS2Gen.Guid("http://www.dalkescientific.com/news/"
"030906-PyRSS2Gen.html"),
pubDate = datetime.datetime(2003, 9, 6, 21, 31)),
])
rss.write_xml(open("pyrss2gen.xml", "w"))
class WriteToCsv(object):
def process_item(self, item, spider):
write_to_csv(item)
return item
但问题是它只将最后一个条目写入 xml 文件。我怎样才能解决这个问题?我需要为每个条目添加新行吗?
items.py 是:
class CraiglistSampleItem(scrapy.Item):
# define the fields for your item here like:
# name = scrapy.Field()
title=Field()
link=Field()
使用a
追加,每次使用w
都在覆盖,所以你只得到最后一条数据:
rss.write_xml(open("pyrss2gen.xml", "a"))
如果您查看原始代码,您也可以使用 a
而不是 w
。
您可能希望在打开文件或至少关闭文件时使用 with
。