简单的 scrapy 程序 运行 在 shell 上成功但未将数据导出到 csv
Simple scrapy program running successfully on shell but not exporting data to csv
我一直试图只从特定的 link 评论中抓取数据,但是当我 运行 它在 shell 上时它 运行 成功了但是当我我正在尝试将其导出到 csv 文件,我只得到 comment_user 而不是 comment_data 为什么?
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.selector import Selector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from urlparse import urljoin
from commen.items import CommenItem
class criticspider(CrawlSpider):
name ="delh"
allowed_domains =["consumercomplaints.in"]
#start_urls =["http://www.consumercomplaints.in/?search=delhivery&page=2","http://www.consumercomplaints.in/?search=delhivery&page=3","http://www.consumercomplaints.in/?search=delhivery&page=4","http://www.consumercomplaints.in/?search=delhivery&page=5","http://www.consumercomplaints.in/?search=delhivery&page=6","http://www.consumercomplaints.in/?search=delhivery&page=7","http://www.consumercomplaints.in/?search=delhivery&page=8","http://www.consumercomplaints.in/?search=delhivery&page=9","http://www.consumercomplaints.in/?search=delhivery&page=10","http://www.consumercomplaints.in/?search=delhivery&page=11"]
start_urls=["http://www.consumercomplaints.in/movement-delivery/delhivery-courier-service-c783976"]
def parse(self,response):
sites = response.xpath('//table[@style="width:100%"]')
items = []
for site in sites:
item = CommenItem()
item['comment_user'] = site.xpath('.//td[@class="comments"]/div[1]/a/text()').extract()
item['comment_data'] = site.xpath('.//tr[3]/td/div/text()').extract()
items.append(item)
return items
parse()
方法中实现的逻辑有点不正确。我会这样走:
def parse(self,response):
sites = response.xpath('//td/div[starts-with(@id, "c")]')
for site in sites:
item = CommenItem()
item['comment_user'] = site.xpath('.//td[@class="comments"]/div[1]/a/text()').extract()[0].strip()
item['comment_data'] = ''.join(site.xpath('.//td[@class="compl-text"]/div//text()').extract()).strip()
yield item
我一直试图只从特定的 link 评论中抓取数据,但是当我 运行 它在 shell 上时它 运行 成功了但是当我我正在尝试将其导出到 csv 文件,我只得到 comment_user 而不是 comment_data 为什么?
from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.selector import Selector
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from urlparse import urljoin
from commen.items import CommenItem
class criticspider(CrawlSpider):
name ="delh"
allowed_domains =["consumercomplaints.in"]
#start_urls =["http://www.consumercomplaints.in/?search=delhivery&page=2","http://www.consumercomplaints.in/?search=delhivery&page=3","http://www.consumercomplaints.in/?search=delhivery&page=4","http://www.consumercomplaints.in/?search=delhivery&page=5","http://www.consumercomplaints.in/?search=delhivery&page=6","http://www.consumercomplaints.in/?search=delhivery&page=7","http://www.consumercomplaints.in/?search=delhivery&page=8","http://www.consumercomplaints.in/?search=delhivery&page=9","http://www.consumercomplaints.in/?search=delhivery&page=10","http://www.consumercomplaints.in/?search=delhivery&page=11"]
start_urls=["http://www.consumercomplaints.in/movement-delivery/delhivery-courier-service-c783976"]
def parse(self,response):
sites = response.xpath('//table[@style="width:100%"]')
items = []
for site in sites:
item = CommenItem()
item['comment_user'] = site.xpath('.//td[@class="comments"]/div[1]/a/text()').extract()
item['comment_data'] = site.xpath('.//tr[3]/td/div/text()').extract()
items.append(item)
return items
parse()
方法中实现的逻辑有点不正确。我会这样走:
def parse(self,response):
sites = response.xpath('//td/div[starts-with(@id, "c")]')
for site in sites:
item = CommenItem()
item['comment_user'] = site.xpath('.//td[@class="comments"]/div[1]/a/text()').extract()[0].strip()
item['comment_data'] = ''.join(site.xpath('.//td[@class="compl-text"]/div//text()').extract()).strip()
yield item