抓取结果导出问题

Question

我写了一个简单的蜘蛛程序来搜索网站上的详细信息。当我在控制台上运行它时，我得到了输出，但是如果我使用 -o filename.json 将它放入文件中，它只是在文件中给我一个 [ 。我该怎么办？

我的蜘蛛长得像

import scrapy
from tutorial.items import TutorialItem

class ChillumSpider(scrapy.Spider):
name = "chillum"
allowed_domains = ["flipkart.com"]
start_urls = ["http://www.flipkart.com/search?q=brown+jacket&as=offas-show=off&otracker=start"]

def parse(self, response):
    title=response.xpath('//a[@class="fk-display-block"]/text()').extract()
    print title

我在控制台上的输出看起来像

[u"\n Asst JKT8810 Full Sleeve Self Design Men's Cotton ", u' ', u"\n Justanned Full Sleeve Solid Men's Bomber ", u' ', u"\n Pepe Sleeveless Solid Men's ", u' ', u"\n Platinum Studio Sleeveless Solid Men's Nehru ", u' ', u"\n Yepme Sleevele ss Solid Men's ", u' ', u'\n Love Leather ', u" Full Sleeve Solid Men's Puleather Ja...\n ", u"\n Justanned Full Sleeve Solid Men's Bomber ", u' ', u"\n Oceanic Full Sleeve Self Design Men's ", u' ', u"\n Dooda Full Sleeve Solid Men's ", u' ', u"\n Bare Skin Full Sleeve Self Design Men's ", u' ', u"\n Asst Full Sleeve Solid Women's ", u' ', u"\n Locomotive F ull Sleeve Men's ", u' ', u"\n Justanned Full Sleeve Solid Women's Leather ", u' ', u' ', u"\n Wrangler Sleeveless Solid Men's ", u' ', u"\n TSX Sleeveless Solid Men's Bomber ", u' ']

但是当我这样做时 scrapy crawl spider_name -o filename.json 我没有在文件中得到相同的输出。

Answer 1

这是因为您需要 return Item 个实例：

import scrapy
from tutorial.items import TutorialItem

class ChillumSpider(scrapy.Spider):
    name = "chillum"
    allowed_domains = ["flipkart.com"]
    start_urls = ["http://www.flipkart.com/search?q=brown+jacket&as=offas-show=off&otracker=start"]

    def parse(self, response):
        titles = response.xpath('//a[@class="fk-display-block"]/text()').extract()
        for title in titles:
            item = TutorialItem()
            item['title'] = title
            yield item

抓取结果导出问题

Scrape result export prooblem

python

web-crawler

scrapy