无法快速跟踪我不知道如何操作的链接
Can't get scrapy to follow links i don't know how to
基本上我的目标是抓取每个产品项目页面,但我认为我的代码是错误的,我不知道如何使用其他方法..
import scrapy
class AdamdentalSpider(scrapy.Spider):
name = "adamdental"
start_urls = [ "https://www.adamdental.com.au/search?ProductSearch=%25" ]
def parse(self, response):
products = response.css("div[data-role=product]")
for product in products:
title_item = products.css("span.widget-productlist-title a")[0]
url = title_item.attrib['href']
yield scrapy.Request(
url = self.start_urls[0] + url,
callback = self.parse_details
)
def parse_details(self, response):
main = response.css("div.product-detail-right")
yield{
"title": main.css("h1.widget-product-title::text"),
"sku": main.css("h4.subtitle::text"),
"price": main.css("span.item-price"),
"description": main.css("div.widget-product-field.info-group.widget-product-field-ProductDescription.description-gap"),
}
单个请求和两个响应以及两个产出不是使用 scrapy 提取数据的正确方法。
import scrapy
class AdamdentalSpider(scrapy.Spider):
name = "adamdental"
start_urls = [ "https://www.adamdental.com.au/search?ProductSearch=%25" ]
def parse(self, response):
for link in response.css('span.widget-productlist-title'):
rel_url= link.css('a::attr(href)').get()
abs_url=f'https://www.adamdental.com.au{rel_url}'
yield scrapy.Request(
url=abs_url,
callback = self.parse_details
)
def parse_details(self, response):
yield {
"title": response.css("h1.widget-product-title::text").get(),
"sku": response.css("h4.subtitle::text").get(),
"price": response.css("span.item-price::text").get(),
"description": ''.join(response.xpath('//*[@class="info-group-content"]//text()').getall()).replace('\r\n','').strip()
}
输出:
{'title': 'Disposable Premium Air Water Triplex Syringe Tips 150/pk', 'sku': ' 103100W', 'price': '.00', 'description': '150/packMetal interior, plastic exteriorInterchangeable with most metal tips with no conversionDesign for snug locking fit'}
2022-05-26 18:29:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.adamdental.com.au/anthogyr-torq-control-universal-torque-wrench> (referer: https://www.adamdental.com.au/search?ProductSearch=%25)
2022-05-26 18:29:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adamdental.com.au/anthogyr-torq-control-universal-torque-wrench>
{'title': 'Anthogyr Torq Control Universal Torque Wrench', 'sku': ' 15501', 'price': '10.00', 'description': 'Anthogyr products are special order items and therefore cannot be refunded, only exchanged for other Anthogyr products.Universal Torque Wrench Torq ControlThe success of the implant treatment\xa0depends on\xa0the precise tightening\xa0of the parts placed directly on the implant. A pre-stressed tightening of the screw will help avoid any risk of screw loosening. Also, high tightening torques may lead to screw fracture.A calibrated tightening can only be guaranteed through the use of a precision instrument offering a torque control system.The dynamometrical manual wrench *Torq Control®\xa0has been specially designed to meet those requirements.Universal torque wrench, recommended with any type of implantsAutomatic declutching
for optimum securityOptimized access in mouth thanks to the micro-head100° angulated micro-head for easy access in mouth (posterior areas)Perfect control of torque thanks to 7 torques values (10/15/20/25/30/32/35N.cm)Only 135 gr for a better freedom of movementOne piece design with smooth surface to limit bacterial retention'}
2022-05-26 18:29:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.adamdental.com.au/infection-control/protective-eyewear/face-shields-and-visors/eye-shield-refills-12pk> (referer: https://www.adamdental.com.au/search?ProductSearch=%25)
2022-05-26 18:29:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adamdental.com.au/infection-control/protective-eyewear/face-shields-and-visors/eye-shield-refills-12pk>
{'title': 'Eye Shield Refills 12pk', 'sku': ' 18110', 'price': '.50', 'description': '12 Disposable Eye Shields'}
2022-05-26 18:29:56 [scrapy.core.engine] DEBUG: Crawled (200)
...等等
基本上我的目标是抓取每个产品项目页面,但我认为我的代码是错误的,我不知道如何使用其他方法..
import scrapy
class AdamdentalSpider(scrapy.Spider):
name = "adamdental"
start_urls = [ "https://www.adamdental.com.au/search?ProductSearch=%25" ]
def parse(self, response):
products = response.css("div[data-role=product]")
for product in products:
title_item = products.css("span.widget-productlist-title a")[0]
url = title_item.attrib['href']
yield scrapy.Request(
url = self.start_urls[0] + url,
callback = self.parse_details
)
def parse_details(self, response):
main = response.css("div.product-detail-right")
yield{
"title": main.css("h1.widget-product-title::text"),
"sku": main.css("h4.subtitle::text"),
"price": main.css("span.item-price"),
"description": main.css("div.widget-product-field.info-group.widget-product-field-ProductDescription.description-gap"),
}
单个请求和两个响应以及两个产出不是使用 scrapy 提取数据的正确方法。
import scrapy
class AdamdentalSpider(scrapy.Spider):
name = "adamdental"
start_urls = [ "https://www.adamdental.com.au/search?ProductSearch=%25" ]
def parse(self, response):
for link in response.css('span.widget-productlist-title'):
rel_url= link.css('a::attr(href)').get()
abs_url=f'https://www.adamdental.com.au{rel_url}'
yield scrapy.Request(
url=abs_url,
callback = self.parse_details
)
def parse_details(self, response):
yield {
"title": response.css("h1.widget-product-title::text").get(),
"sku": response.css("h4.subtitle::text").get(),
"price": response.css("span.item-price::text").get(),
"description": ''.join(response.xpath('//*[@class="info-group-content"]//text()').getall()).replace('\r\n','').strip()
}
输出:
{'title': 'Disposable Premium Air Water Triplex Syringe Tips 150/pk', 'sku': ' 103100W', 'price': '.00', 'description': '150/packMetal interior, plastic exteriorInterchangeable with most metal tips with no conversionDesign for snug locking fit'}
2022-05-26 18:29:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.adamdental.com.au/anthogyr-torq-control-universal-torque-wrench> (referer: https://www.adamdental.com.au/search?ProductSearch=%25)
2022-05-26 18:29:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adamdental.com.au/anthogyr-torq-control-universal-torque-wrench>
{'title': 'Anthogyr Torq Control Universal Torque Wrench', 'sku': ' 15501', 'price': '10.00', 'description': 'Anthogyr products are special order items and therefore cannot be refunded, only exchanged for other Anthogyr products.Universal Torque Wrench Torq ControlThe success of the implant treatment\xa0depends on\xa0the precise tightening\xa0of the parts placed directly on the implant. A pre-stressed tightening of the screw will help avoid any risk of screw loosening. Also, high tightening torques may lead to screw fracture.A calibrated tightening can only be guaranteed through the use of a precision instrument offering a torque control system.The dynamometrical manual wrench *Torq Control®\xa0has been specially designed to meet those requirements.Universal torque wrench, recommended with any type of implantsAutomatic declutching
for optimum securityOptimized access in mouth thanks to the micro-head100° angulated micro-head for easy access in mouth (posterior areas)Perfect control of torque thanks to 7 torques values (10/15/20/25/30/32/35N.cm)Only 135 gr for a better freedom of movementOne piece design with smooth surface to limit bacterial retention'}
2022-05-26 18:29:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.adamdental.com.au/infection-control/protective-eyewear/face-shields-and-visors/eye-shield-refills-12pk> (referer: https://www.adamdental.com.au/search?ProductSearch=%25)
2022-05-26 18:29:55 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.adamdental.com.au/infection-control/protective-eyewear/face-shields-and-visors/eye-shield-refills-12pk>
{'title': 'Eye Shield Refills 12pk', 'sku': ' 18110', 'price': '.50', 'description': '12 Disposable Eye Shields'}
2022-05-26 18:29:56 [scrapy.core.engine] DEBUG: Crawled (200)
...等等