Scrapy 返回一个空的 json 文件
Scrapy returning a empty json file
我正在尝试从网站获取数据,一切似乎都是正确的并且 xpath 在 shell.
上进行了测试
# -*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider
class KabumspiderSpider(CrawlSpider):
name = "kabumspider"
allowed_domain = ["www.kabum.com.br"]
start_urls = ["https://www.kabum.com.br"]
def parse(self, response):
categorias = response.xpath('//p[@class = "bot-categoria"]/a/text()').extract()
links = response.xpath('//p[@class = "bot-categoria"]/a/@href').extract()
for categoria in zip(categorias, links):
info = {
'categoria': categoria[0],
'link': categoria[1],
}
yield info
虽然,输出似乎是:
[
我的代码有什么问题?
我 运行 爬虫,它对我来说运行良好。我发现的唯一问题是您的解析方法在 class.
之外
# -*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider
class KabumspiderSpider(CrawlSpider):
name = "kabumspider"
allowed_domain = ["www.kabum.com.br"]
start_urls = ["https://www.kabum.com.br"]
def parse(self, response):
categorias = response.xpath('//p[@class = "bot-categoria"]/a/text()').extract()
links = response.xpath('//p[@class = "bot-categoria"]/a/@href').extract()
for categoria in zip(categorias, links):
info = {
'categoria': categoria[0],
'link': categoria[1],
}
yield info
我正在尝试从网站获取数据,一切似乎都是正确的并且 xpath 在 shell.
上进行了测试# -*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider
class KabumspiderSpider(CrawlSpider):
name = "kabumspider"
allowed_domain = ["www.kabum.com.br"]
start_urls = ["https://www.kabum.com.br"]
def parse(self, response):
categorias = response.xpath('//p[@class = "bot-categoria"]/a/text()').extract()
links = response.xpath('//p[@class = "bot-categoria"]/a/@href').extract()
for categoria in zip(categorias, links):
info = {
'categoria': categoria[0],
'link': categoria[1],
}
yield info
虽然,输出似乎是:
[
我的代码有什么问题?
我 运行 爬虫,它对我来说运行良好。我发现的唯一问题是您的解析方法在 class.
之外# -*- coding: utf-8 -*-
from scrapy.contrib.spiders import CrawlSpider
class KabumspiderSpider(CrawlSpider):
name = "kabumspider"
allowed_domain = ["www.kabum.com.br"]
start_urls = ["https://www.kabum.com.br"]
def parse(self, response):
categorias = response.xpath('//p[@class = "bot-categoria"]/a/text()').extract()
links = response.xpath('//p[@class = "bot-categoria"]/a/@href').extract()
for categoria in zip(categorias, links):
info = {
'categoria': categoria[0],
'link': categoria[1],
}
yield info