Python: Scrapy CSV 导出不正确？

Question

我只是想写入一个 csv。但是我有两个单独的 for-statements，因此每个 for-statement 的数据独立导出并破坏顺序。建议？

def parse(self, response):
        hxs = HtmlXPathSelector(response)
        titles = hxs.select('//td[@class="title"]')
        subtext = hxs.select('//td[@class="subtext"]')
        items = []
        for title in titles:
            item = HackernewsItem()
            item["title"] = title.select("a/text()").extract()
            item["url"] = title.select("a/@href").extract()
            items.append(item)
        for score in subtext:
            item = HackernewsItem()
            item["score"] = score.select("span/text()").extract()
            items.append(item)
        return items

如下图所示，第二个 for-statement 打印在其他 "among" 下方，而不是像 header 那样打印在其他 "among" 下方。

附上 CSV 图片：

和 github link 完整文件：https://github.com/nchlswtsn/scrapy/blob/master/items.csv

Answer 1

Python 2.7 的 CSV 模块不支持 Unicode，因此建议使用 unicodecsv。

$pip install unicodecsv

The unicodecsv is a drop-in replacement for Python 2's csv module which supports unicode strings without a hassle.

然后用这个代替import csv

import unicodecsv as csv

Answer 2

您导出元素的顺序符合您在 CSV 文件中找到的顺序，首先导出所有标题，然后导出所有子文本元素。
我猜你是想废弃 HN 文章，这是我的建议：

def parse(self, response):
    hxs = HtmlXPathSelector(response)
    titles = hxs.select('//td[@class="title"]')
    items = []
    for title in titles:
        item = HackernewsItem()
        item["title"] = title.select("a/text()").extract()
        item["url"] = title.select("a/@href").extract()
        item["score"] = title.select('../td[@class="subtext"]/span/text()').extract()
        items.append(item)
    return items

我没有测试过，但它会给你一个想法。

Python: Scrapy CSV 导出不正确？

Python: Scrapy CSV exports incorrectly?

python

csv

export

scrapy