Scrapy 只返回页面的第一个结果
Scrapy only returning first result from the page
我有以下蜘蛛,它很简单,只是试图从单个类别页面解析产品标题、url 和价格。但问题是蜘蛛只能从页面获得第一个结果,仅此而已。我不明白任何人都可以解释这种行为。
url : website to scrape
蜘蛛:
import scrapy
from scrapy.crawler import CrawlerProcess
class VapeSpider(scrapy.Spider):
name = "vape"
# custom_settings = {
# "FEED_FORMAT": "csv",
# "FEED_URI": "vape.csv",
# "LOG_FILE": "vape.log",
# }
def start_requests(self):
yield scrapy.Request(
"https://buyeliquidonline.com.au/product-category/geek-vape/",
callback=self.parse,
)
def parse(self, response):
for prod in response.css("ul.products:nth-child(2)"):
yield {
"title": prod.css("h2.woocommerce-loop-product__title")
.css("a::text")
.get()
}
process = CrawlerProcess()
process.crawl(VapeSpider)
process.start()
问题出在 css 元素 selection 中。 ul.products:nth-child(2)
select 整个 select 页面一次。您需要 select 所有容器都位于 li
标签上。所以你需要 ul.products:nth-child(2) li
然后使用 for loop
import scrapy
from scrapy.crawler import CrawlerProcess
class VapeSpider(scrapy.Spider):
name = "vap"
# custom_settings = {
# "FEED_FORMAT": "csv",
# "FEED_URI": "vape.csv",
# "LOG_FILE": "vape.log",
# }
def start_requests(self):
yield scrapy.Request(
"https://buyeliquidonline.com.au/product-category/geek-vape",
callback=self.parse,
)
def parse(self, response):
for prod in response.css("ul.products:nth-child(2) li"):
yield {
"title": prod.css("h2.woocommerce-loop-product__title").css("a::text").get()
}
process = CrawlerProcess()
process.crawl(VapeSpider)
process.start()
输出:
{'title': 'Geekvape Aegis Boost Empty Pod Cartridge 3.7ml'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Boost Pod Kit Luxury Edition 1500mah'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Hero Pod Kit 1200mah 4ml'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Hero Replacement Pod Cartridge'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Legend Kit With Z Sub Ohm Tank 5ml'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Max Starter Kit'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Solo 100W Starter Kit'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis X 200w Starter Kit W/ Zeus'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aero 5ml replacement glass'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Alpha 4ml Replacement Glass'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Boost Replacement Coils'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Cerberus 5.5ml replacement glass'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape G Coil Zeus Tank'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Super Mesh Coils'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Wenax K1 Pod System'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Zeus replacement glass'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Zeus sub ohm tank'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape ZX RTA'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Wenax replacement pods'}
我有以下蜘蛛,它很简单,只是试图从单个类别页面解析产品标题、url 和价格。但问题是蜘蛛只能从页面获得第一个结果,仅此而已。我不明白任何人都可以解释这种行为。
url : website to scrape
蜘蛛:
import scrapy
from scrapy.crawler import CrawlerProcess
class VapeSpider(scrapy.Spider):
name = "vape"
# custom_settings = {
# "FEED_FORMAT": "csv",
# "FEED_URI": "vape.csv",
# "LOG_FILE": "vape.log",
# }
def start_requests(self):
yield scrapy.Request(
"https://buyeliquidonline.com.au/product-category/geek-vape/",
callback=self.parse,
)
def parse(self, response):
for prod in response.css("ul.products:nth-child(2)"):
yield {
"title": prod.css("h2.woocommerce-loop-product__title")
.css("a::text")
.get()
}
process = CrawlerProcess()
process.crawl(VapeSpider)
process.start()
问题出在 css 元素 selection 中。 ul.products:nth-child(2)
select 整个 select 页面一次。您需要 select 所有容器都位于 li
标签上。所以你需要 ul.products:nth-child(2) li
然后使用 for loop
import scrapy
from scrapy.crawler import CrawlerProcess
class VapeSpider(scrapy.Spider):
name = "vap"
# custom_settings = {
# "FEED_FORMAT": "csv",
# "FEED_URI": "vape.csv",
# "LOG_FILE": "vape.log",
# }
def start_requests(self):
yield scrapy.Request(
"https://buyeliquidonline.com.au/product-category/geek-vape",
callback=self.parse,
)
def parse(self, response):
for prod in response.css("ul.products:nth-child(2) li"):
yield {
"title": prod.css("h2.woocommerce-loop-product__title").css("a::text").get()
}
process = CrawlerProcess()
process.crawl(VapeSpider)
process.start()
输出:
{'title': 'Geekvape Aegis Boost Empty Pod Cartridge 3.7ml'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Boost Pod Kit Luxury Edition 1500mah'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Hero Pod Kit 1200mah 4ml'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Hero Replacement Pod Cartridge'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Legend Kit With Z Sub Ohm Tank 5ml'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Max Starter Kit'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis Solo 100W Starter Kit'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aegis X 200w Starter Kit W/ Zeus'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Aero 5ml replacement glass'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Alpha 4ml Replacement Glass'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Boost Replacement Coils'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Cerberus 5.5ml replacement glass'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape G Coil Zeus Tank'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Super Mesh Coils'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Wenax K1 Pod System'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Zeus replacement glass'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape Zeus sub ohm tank'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Geekvape ZX RTA'}
2022-03-23 06:04:27 [scrapy.core.scraper] DEBUG: Scraped from <200 https://buyeliquidonline.com.au/product-category/geek-vape/>
{'title': 'Wenax replacement pods'}