Scrapy 不会将所有结果保存到 csv
Scrapy doesn't save all results to csv
我正在尝试抓取特定网页,虽然在控制台上我得到了所有结果,但在输出的 csv 上却没有。在这种情况下,我想要特定搜索的标题和作者,但我只得到了标题。如果我颠倒两者的顺序我得到作者,所以它只需要第一个。为什么?
import scrapy
QUERY = "q=brilliant+friend&qt=results_page#x0%253Abook-%2C%2528x0%253Abook%2Bx4%253Aprintbook%2529%2C%2528x0%253Abook%2Bx4%253Adigital%2529%2C%2528x0%253Abook%2Bx4%253Alargeprint%2529%2C%2528x0%253Abook%2Bx4%253Amss%2529%2C%2528x0%253Abook%2Bx4%253Athsis%2529%2C%2528x0%253Abook%2Bx4%253Abraille%2529%2C%2528x0%253Abook%2Bx4%253Amic%2529%2Cx0%253Aartchap-%2C%2528x0%253Aartchap%2Bx4%253Achptr%2529%2C%2528x0%253Aartchap%2Bx4%253Adigital%2529format"
class Spider(scrapy.Spider):
name = 'worldcatspider'
start_urls = ['https://www.worldcat.org/search?start=%s&%s' % (number, QUERY) for number in range(0, 4400, 10)]
def parse(self, response):
for title in response.css('.name a > strong ::text').extract():
yield {"title:": title}
for author in response.css('.author ::text').extract():
yield {"author:": author}
我的建议将用于陈述他们的头class或div。
我还没有检查过,但这应该有效:
def parse(self, response):
for page in response.css('.menuElem'):
title = page.css('.name a > strong ::text').extract()
author = page.css('.author ::text').extract()
yield {"title": title,
"author:": author}
我正在尝试抓取特定网页,虽然在控制台上我得到了所有结果,但在输出的 csv 上却没有。在这种情况下,我想要特定搜索的标题和作者,但我只得到了标题。如果我颠倒两者的顺序我得到作者,所以它只需要第一个。为什么?
import scrapy
QUERY = "q=brilliant+friend&qt=results_page#x0%253Abook-%2C%2528x0%253Abook%2Bx4%253Aprintbook%2529%2C%2528x0%253Abook%2Bx4%253Adigital%2529%2C%2528x0%253Abook%2Bx4%253Alargeprint%2529%2C%2528x0%253Abook%2Bx4%253Amss%2529%2C%2528x0%253Abook%2Bx4%253Athsis%2529%2C%2528x0%253Abook%2Bx4%253Abraille%2529%2C%2528x0%253Abook%2Bx4%253Amic%2529%2Cx0%253Aartchap-%2C%2528x0%253Aartchap%2Bx4%253Achptr%2529%2C%2528x0%253Aartchap%2Bx4%253Adigital%2529format"
class Spider(scrapy.Spider):
name = 'worldcatspider'
start_urls = ['https://www.worldcat.org/search?start=%s&%s' % (number, QUERY) for number in range(0, 4400, 10)]
def parse(self, response):
for title in response.css('.name a > strong ::text').extract():
yield {"title:": title}
for author in response.css('.author ::text').extract():
yield {"author:": author}
我的建议将用于陈述他们的头class或div。
我还没有检查过,但这应该有效:
def parse(self, response):
for page in response.css('.menuElem'):
title = page.css('.name a > strong ::text').extract()
author = page.css('.author ::text').extract()
yield {"title": title,
"author:": author}