Css 选择器 returns 空白列表
Css selector returns blank list
你好,我是 scrapy 和网络抓取的新手,我很难从这个网站抓取:https://www.webuycars.co.za/buy-a-car
我的目标是从页面上抓取汽车数据,如名称、价格等
我从
开始
scrapy shell "https://www.webuycars.co.za/buy-a-car"
然后我做了
fetch("http://localhost:8050/render.html?url=https://www.webuycars.co.za/buy-a-car")
我将 splash 与 scrapy 结合使用,因为我得出的结论是该页面是使用 javascript 创建的
然后我尝试发送一些请求,但是在页面的 html 中的某个点之后我开始出现空白(这就是我假设 javascript 创建的内容)
例如
response.css("div.col-lg-3.col-md-4.col-sm-6.mt-3").getall()
[]
response.css("div.result-item-title").getall()
[]
response.css("div.result-item-title").get()
response.css(".result-item-title").getall()
[]
获得标题似乎有效,但我尝试过的其他方法都无效
response.css("title::text").get()
'WeBuyCars | Sell Cars For Cash | Free Online Vehicle Valuations'
我一直在尝试执行这些请求,以确保在对蜘蛛程序进行编程并将其正确实施到我的程序中之前获得结果。
我在设置文件中设置了我的用户代理。
我查看了所有源文件,看看是否有 json 文件包含我需要的内容,但没有。
我不确定我还能做什么。我已经被这个问题困扰了很长一段时间,如果有任何帮助,我将不胜感激。
您可以从 API
响应中获取所有数据
import json
import scrapy
class CarsSpider(scrapy.Spider):
name = 'car'
body = {"to":24,"size":24,"type":"All","filter_type":"all","subcategory":None,"q":"","Make":None,"Roadworthy":None,"Auctions":[],"Model":None,"Variant":None,"DealerKey":None,"FuelType":None,"BodyType":None,"Gearbox":None,"AxleConfiguration":None,"Colour":None,"FinanceGrade":None,"Priced_Amount_Gte":0,"Priced_Amount_Lte":0,"MonthlyInstallment_Amount_Gte":0,"MonthlyInstallment_Amount_Lte":0,"auctionDate":None,"auctionEndDate":None,"auctionDurationInSeconds":None,"Kilometers_Gte":0,"Kilometers_Lte":0,"Priced_Amount_Sort":"","Bid_Amount_Sort":"","Kilometers_Sort":"","Year_Sort":"","Auction_Date_Sort":"","Auction_Lot_Sort":"","Year":[],"Price_Update_Date_Sort":"","Online_Auction_Date_Sort":"","Online_Auction_In_Progress":""}
def start_requests(self):
yield scrapy.Request(
url='https://website-elastic-api.webuycars.co.za/api/search',
callback=self.parse,
body=json.dumps(self.body),
method="POST")
def parse(self, response):
response = json.loads(response.body)
for resp in response['data']:
yield {
'Title': resp['OnlineDescription']
}
输出:
{'Title': '2022 Citroen C3 Aircross 1.2T Puretech Sine Auto'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Toyota Hilux 2.4 Gd-6 RB Raider Pick Up Double Cab'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Datsun GO 1.2 MID'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2013 Hyundai i10 1.25 Gls/fluid Auto'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Suzuki S-Presso 1.0 GL+ AMT'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 SYM Symphony JET 14 200'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 Nissan Micra 1.2 Active Visia'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2021 Suzuki Super Carry 1.2i Pick Up Single Cab'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Suzuki AN UB 125 (burgman)'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Honda XRL XR 125l'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Toyota Hilux 2.4 Gd-6 RB Raider Pick Up Double Cab'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Land Rover Defender 110 D300 SE X-Dynamic (221 KW)'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Suzuki S-Presso 1.0 GL'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Big Boy TSR 250'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Hyundai Atos/Atoz 1.1 Motion AMT'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 Fiat Panda 900t Lounge'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2017 Chevrolet Spark 1.2 Campus/curve 5-Door'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Crosby Adventure Bike 400cc'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Renault Kwid 1.0 Climber 5-Door'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 Suzuki Swift 1.2 GLX'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Volkswagen Polo Classic GP 1.4 Comfortline'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Renault Kwid 1.0 Climber 5-Door Auto'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 SYM Crox X-Pro 125'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 Yamaha YZ 450 FX'}
你好,我是 scrapy 和网络抓取的新手,我很难从这个网站抓取:https://www.webuycars.co.za/buy-a-car
我的目标是从页面上抓取汽车数据,如名称、价格等
我从
开始scrapy shell "https://www.webuycars.co.za/buy-a-car"
然后我做了
fetch("http://localhost:8050/render.html?url=https://www.webuycars.co.za/buy-a-car")
我将 splash 与 scrapy 结合使用,因为我得出的结论是该页面是使用 javascript 创建的 然后我尝试发送一些请求,但是在页面的 html 中的某个点之后我开始出现空白(这就是我假设 javascript 创建的内容) 例如
response.css("div.col-lg-3.col-md-4.col-sm-6.mt-3").getall()
[]
response.css("div.result-item-title").getall()
[]
response.css("div.result-item-title").get()
response.css(".result-item-title").getall()
[]
获得标题似乎有效,但我尝试过的其他方法都无效
response.css("title::text").get()
'WeBuyCars | Sell Cars For Cash | Free Online Vehicle Valuations'
我一直在尝试执行这些请求,以确保在对蜘蛛程序进行编程并将其正确实施到我的程序中之前获得结果。 我在设置文件中设置了我的用户代理。 我查看了所有源文件,看看是否有 json 文件包含我需要的内容,但没有。 我不确定我还能做什么。我已经被这个问题困扰了很长一段时间,如果有任何帮助,我将不胜感激。
您可以从 API
响应中获取所有数据
import json
import scrapy
class CarsSpider(scrapy.Spider):
name = 'car'
body = {"to":24,"size":24,"type":"All","filter_type":"all","subcategory":None,"q":"","Make":None,"Roadworthy":None,"Auctions":[],"Model":None,"Variant":None,"DealerKey":None,"FuelType":None,"BodyType":None,"Gearbox":None,"AxleConfiguration":None,"Colour":None,"FinanceGrade":None,"Priced_Amount_Gte":0,"Priced_Amount_Lte":0,"MonthlyInstallment_Amount_Gte":0,"MonthlyInstallment_Amount_Lte":0,"auctionDate":None,"auctionEndDate":None,"auctionDurationInSeconds":None,"Kilometers_Gte":0,"Kilometers_Lte":0,"Priced_Amount_Sort":"","Bid_Amount_Sort":"","Kilometers_Sort":"","Year_Sort":"","Auction_Date_Sort":"","Auction_Lot_Sort":"","Year":[],"Price_Update_Date_Sort":"","Online_Auction_Date_Sort":"","Online_Auction_In_Progress":""}
def start_requests(self):
yield scrapy.Request(
url='https://website-elastic-api.webuycars.co.za/api/search',
callback=self.parse,
body=json.dumps(self.body),
method="POST")
def parse(self, response):
response = json.loads(response.body)
for resp in response['data']:
yield {
'Title': resp['OnlineDescription']
}
输出:
{'Title': '2022 Citroen C3 Aircross 1.2T Puretech Sine Auto'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Toyota Hilux 2.4 Gd-6 RB Raider Pick Up Double Cab'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Datsun GO 1.2 MID'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2013 Hyundai i10 1.25 Gls/fluid Auto'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Suzuki S-Presso 1.0 GL+ AMT'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 SYM Symphony JET 14 200'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 Nissan Micra 1.2 Active Visia'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2021 Suzuki Super Carry 1.2i Pick Up Single Cab'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Suzuki AN UB 125 (burgman)'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Honda XRL XR 125l'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Toyota Hilux 2.4 Gd-6 RB Raider Pick Up Double Cab'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Land Rover Defender 110 D300 SE X-Dynamic (221 KW)'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Suzuki S-Presso 1.0 GL'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Big Boy TSR 250'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Hyundai Atos/Atoz 1.1 Motion AMT'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 Fiat Panda 900t Lounge'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2017 Chevrolet Spark 1.2 Campus/curve 5-Door'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Crosby Adventure Bike 400cc'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Renault Kwid 1.0 Climber 5-Door'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 Suzuki Swift 1.2 GLX'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 Volkswagen Polo Classic GP 1.4 Comfortline'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2020 Renault Kwid 1.0 Climber 5-Door Auto'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2022 SYM Crox X-Pro 125'}
2022-05-01 08:15:37 [scrapy.core.scraper] DEBUG: Scraped from <200 https://website-elastic-api.webuycars.co.za/api/search>
{'Title': '2019 Yamaha YZ 450 FX'}