Scrapy:无法从 xpath 获取数据
Scrapy: Unable to get data from xpath
我正在尝试从以下脚本中获取数据。我在解析函数中将 XPath 分成了 02 个部分。第一部分包含我不想循环的固定数据,第二部分包含我想循环的 table 。当我 运行 脚本时,它只提供第二部分数据。我使用 Splash 来渲染 HTML.
import scrapy
from scrapy_splash import SplashRequest
class RaceSpider(scrapy.Spider):
name = 'race'
allowed_domains = ['www.racing.com']
script = '''
function main(splash, args)
splash.private_mode_enabled = false
assert(splash:go(args.url))
assert(splash:wait(5))
splash:set_viewport_full()
return splash:html()
end
'''
def start_requests(self):
yield SplashRequest(
url= 'https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results',
callback=self.parse, endpoint='execute', args={
'lua_source': self.script
}
)
def parse(self, response):
information = response.xpath("//div[@class='race-results-table ng-scope']/table")
yield{
#part 1
'Race Number': response.xpath("(.//span[@class='number-circle xlg'])[1]/text()").get(),
'Title': response.xpath("(.//div[@class='popup ng-scope']/h1)[1]/text()").get(),
'Result Distance Thumbnail': response.xpath(".//div[@class='ng-scope']/p/text()").get(),
'Track Condition': response.xpath(".//div[@class='condition']/div/p/span/text()").get(),
'Rail': response.xpath("(.//div[@class='rail']/div/p/span)[1]/text()").get(),
}
for info in information:
yield{
#part 2
'Position': info.xpath("(.//td[@class='td-position tcenter']/span)[1]/text()").get(),
'Horse Entry Number': info.xpath("(.//td[@class='horse-name']/h3/a/span)[1]/text()").get(),
'Horse Full Name': info.xpath("(.//td[@class='horse-name']/h3/a/span)[2]/text()").get(),
'Horse Barrier Number': info.xpath("(.//td[@class='horse-name']/h3/a/span)[3]/text()").get(),
'Trainers': info.xpath("(.//td[@class='horse-details']/span/a)[1]/text()").get(),
'Jockey': info.xpath("(.//td[@class='horse-details']/span/a)[2]/text()").get(),
}
输出
2021-09-08 22:58:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results via http://localhost:8050/execute> (referer: None)
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Race Number': '1', 'Title': 'Flemington', 'Date': 'Sat, 8th Aug', 'Result Time': '2:05am', 'Result Distance': '2530m\xa0\xa0', 'Race Name': 'TAB Handicap', 'Result Distance Thumbnail': '2530m', 'Track Condition': 'Soft 7', 'Rail': 'Out 10m Entire Circuit\n ', 'Track Record': 'Unavailable', 'Price Money': '5,000'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': None, 'Horse Entry Number': None, 'Horse Full Name': None, 'Horse Barrier Number': None, 'Trainers': None, 'Jockey': None, 'Gear': None, 'WGT': None, 'Price': None, '800m': None, '400m': None, 'Margin': None, 'SP': None}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '1st', 'Horse Entry Number': '5. ', 'Horse Full Name': 'Exemplar (IRE)', 'Horse Barrier Number': ' (7)', 'Trainers': 'C.Maher & D.Eustace', 'Jockey': 'J.Allen', 'Gear': '1', 'WGT': '56.5kg', 'Price': ',250', '800m': '1st', '400m': '1st', 'Margin': '2:45.74', 'SP': '.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '2nd', 'Horse Entry Number': '3. ', 'Horse Full Name': 'Double You Tee', 'Horse Barrier Number': ' (6)', 'Trainers': 'P.Payne', 'Jockey': 'W.J.Egan', 'Gear': '0', 'WGT': '57.5kg', 'Price': ',300', '800m': '6th', '400m': '4th', 'Margin': '1.25L', 'SP': '.80'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '3rd', 'Horse Entry Number': '6. ', 'Horse Full Name': 'Bertwhistle', 'Horse Barrier Number': ' (4)', 'Trainers': 'D.I.Dodson', 'Jockey': 'L.J.Neindorf', 'Gear': '0', 'WGT': '54kg', 'Price': ',150', '800m': '4th', '400m': '3rd', 'Margin': '4.75L', 'SP': '.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '4th', 'Horse Entry Number': '7. ', 'Horse Full Name': 'Flag Edition (NZ)', 'Horse Barrier Number': ' (2)', 'Trainers': 'M.Payne', 'Jockey': 'M.Payne', 'Gear': '0', 'WGT': '56kg', 'Price': ',750', '800m': '5th', '400m': '6th', 'Margin': '4.85L', 'SP': '.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '5th', 'Horse Entry Number': '8. ', 'Horse Full Name': 'Blandford Lad (NZ)', 'Horse Barrier Number': ' (3)', 'Trainers': 'P.Gelagotis', 'Jockey': 'W.T.Price', 'Gear': '2', 'WGT': '53kg', 'Price': ',050', '800m': '7th', '400m': '7th', 'Margin': '5.6L', 'SP': '.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '6th', 'Horse Entry Number': '4. ', 'Horse Full Name': 'South Pacific (GB)', 'Horse Barrier Number': ' (5)', 'Trainers': 'C.Maher & D.Eustace', 'Jockey': 'D.Oliver', 'Gear': '3', 'WGT': '57.5kg', 'Price': ',700', '800m': '2nd', '400m': '2nd', 'Margin': '5.8L', 'SP': '.95'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '7th', 'Horse Entry Number': '1. ', 'Horse Full Name': 'Home By Midnight (NZ)', 'Horse Barrier Number': ' (1)', 'Trainers': 'P.Payne', 'Jockey': 'T.J.Hope', 'Gear': '2', 'WGT': '60kg', 'Price':
',700', '800m': '3rd', '400m': '5th', 'Margin': '6.55L', 'SP': '.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '\n ', 'Horse Entry Number': '2. ', 'Horse Full Name': 'Lord Belvedere (GB)', 'Horse Barrier Number': None, 'Trainers': 'C.Maher & D.Eustace', 'Jockey': 'B.J.Melham', 'Gear': '0', 'WGT': '60kg', 'Price': '–', '800m': None, '400m': None, 'Margin': None, 'SP': None}
2021-09-08 22:58:36 [scrapy.core.engine] INFO: Closing spider (finished)
2021-09-08 22:58:36 [scrapy.extensions.feedexport] INFO: Stored csv feed (10 items) in: data1.csv
2021-09-08 22:58:36 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 839,
'downloader/request_count': 1,
'downloader/request_method_count/POST': 1,
'downloader/response_bytes': 427762,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 23.855061,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 9, 8, 16, 58, 36, 640162),
'item_scraped_count': 10,
'log_count/DEBUG': 86,
'log_count/INFO': 13,
'log_count/WARNING': 3,
'response_received_count': 1,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'splash/execute/request_count': 1,
'splash/execute/response_count/200': 1,
'start_time': datetime.datetime(2021, 9, 8, 16, 58, 12, 785101)}
2021-09-08 22:58:36 [scrapy.core.engine] INFO: Spider closed (finished)
scrapy 无法在同一个响应中使用两个 yield 方法。
实际上,数据是从 API
调用 json 响应生成的。您可以通过后门生成数据轻松做到这一点,并且可以随心所欲地获取数据项。
这是工作解决方案的示例:
代码:
import scrapy
import json
class RaceSpider(scrapy.Spider):
name = 'race'
headers = {
'accept': 'application/json, text/plain, */*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8,es;q=0.7,ar;q=0.6',
'origin': 'https://www.racing.com',
'referer': 'https://www.racing.com/',
'sec-ch-ua': '"Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36'}
def start_requests(self):
yield scrapy.Request(
url='https://api.racing.com/v1/en-au/meet/details/5162295/',
callback=self.parse,
method="GET",
headers=self.headers)
def parse(self, response):
response = json.loads(response.body)
for resp in response['raceCollection']:
for res in resp['raceResultsCollection']:
#print(resp)
items = {
'Race Number': resp['raceNumber'],
'Result Distance Thumbnail': resp['distance'],
'Title_name': resp['name'],
'Position':res ['barrierNumber'],
'Horse Full Name': res['horse']['fullName'],
'Jockey': res['jockey']['fullName']
}
yield items
输出:
{'Race Number': 1, 'Result Distance Thumbnail': 2530, 'Title_name': 'TAB Handicap', 'Position': 6, 'Horse Full Name': 'Double You Tee', 'Jockey': 'W.J.Egan'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 1, 'Result Distance Thumbnail': 2530, 'Title_name': 'TAB Handicap', 'Position': 4, 'Horse Full Name': 'Bertwhistle', 'Jockey': 'L.J.Neindorf'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 1, 'Result Distance Thumbnail': 2530, 'Title_name': 'TAB Handicap', 'Position': 2, 'Horse Full Name': 'Flag Edition (NZ)', 'Jockey': 'M.Payne'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 9, 'Result Distance Thumbnail': 1410, 'Title_name': 'Rubaroc Handicap', 'Position': 0, 'Horse Full Name': 'Honorable Mention (NZ)', 'Jockey': 'B.Allen'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 9, 'Result Distance Thumbnail': 1410, 'Title_name': 'Rubaroc Handicap', 'Position': 0, 'Horse Full Name': 'Copper Fox', 'Jockey': 'G.J.Cartwright'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 9, 'Result Distance Thumbnail': 1410, 'Title_name': 'Rubaroc Handicap', 'Position': 0, 'Horse Full Name': 'Muswellbrook', 'Jockey': 'J.Mott'}
2021-09-09 23:22:22 [scrapy.core.engine] INFO: Closing spider (finished)
2021-09-09 23:22:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 605,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 19582,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 5.144949,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 9, 9, 17, 22, 22, 334263),
'httpcompression/response_bytes': 205617,
'httpcompression/response_count': 1,
'item_scraped_count': 100,
...等等
我正在尝试从以下脚本中获取数据。我在解析函数中将 XPath 分成了 02 个部分。第一部分包含我不想循环的固定数据,第二部分包含我想循环的 table 。当我 运行 脚本时,它只提供第二部分数据。我使用 Splash 来渲染 HTML.
import scrapy
from scrapy_splash import SplashRequest
class RaceSpider(scrapy.Spider):
name = 'race'
allowed_domains = ['www.racing.com']
script = '''
function main(splash, args)
splash.private_mode_enabled = false
assert(splash:go(args.url))
assert(splash:wait(5))
splash:set_viewport_full()
return splash:html()
end
'''
def start_requests(self):
yield SplashRequest(
url= 'https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results',
callback=self.parse, endpoint='execute', args={
'lua_source': self.script
}
)
def parse(self, response):
information = response.xpath("//div[@class='race-results-table ng-scope']/table")
yield{
#part 1
'Race Number': response.xpath("(.//span[@class='number-circle xlg'])[1]/text()").get(),
'Title': response.xpath("(.//div[@class='popup ng-scope']/h1)[1]/text()").get(),
'Result Distance Thumbnail': response.xpath(".//div[@class='ng-scope']/p/text()").get(),
'Track Condition': response.xpath(".//div[@class='condition']/div/p/span/text()").get(),
'Rail': response.xpath("(.//div[@class='rail']/div/p/span)[1]/text()").get(),
}
for info in information:
yield{
#part 2
'Position': info.xpath("(.//td[@class='td-position tcenter']/span)[1]/text()").get(),
'Horse Entry Number': info.xpath("(.//td[@class='horse-name']/h3/a/span)[1]/text()").get(),
'Horse Full Name': info.xpath("(.//td[@class='horse-name']/h3/a/span)[2]/text()").get(),
'Horse Barrier Number': info.xpath("(.//td[@class='horse-name']/h3/a/span)[3]/text()").get(),
'Trainers': info.xpath("(.//td[@class='horse-details']/span/a)[1]/text()").get(),
'Jockey': info.xpath("(.//td[@class='horse-details']/span/a)[2]/text()").get(),
}
输出
2021-09-08 22:58:36 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results via http://localhost:8050/execute> (referer: None)
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Race Number': '1', 'Title': 'Flemington', 'Date': 'Sat, 8th Aug', 'Result Time': '2:05am', 'Result Distance': '2530m\xa0\xa0', 'Race Name': 'TAB Handicap', 'Result Distance Thumbnail': '2530m', 'Track Condition': 'Soft 7', 'Rail': 'Out 10m Entire Circuit\n ', 'Track Record': 'Unavailable', 'Price Money': '5,000'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': None, 'Horse Entry Number': None, 'Horse Full Name': None, 'Horse Barrier Number': None, 'Trainers': None, 'Jockey': None, 'Gear': None, 'WGT': None, 'Price': None, '800m': None, '400m': None, 'Margin': None, 'SP': None}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '1st', 'Horse Entry Number': '5. ', 'Horse Full Name': 'Exemplar (IRE)', 'Horse Barrier Number': ' (7)', 'Trainers': 'C.Maher & D.Eustace', 'Jockey': 'J.Allen', 'Gear': '1', 'WGT': '56.5kg', 'Price': ',250', '800m': '1st', '400m': '1st', 'Margin': '2:45.74', 'SP': '.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '2nd', 'Horse Entry Number': '3. ', 'Horse Full Name': 'Double You Tee', 'Horse Barrier Number': ' (6)', 'Trainers': 'P.Payne', 'Jockey': 'W.J.Egan', 'Gear': '0', 'WGT': '57.5kg', 'Price': ',300', '800m': '6th', '400m': '4th', 'Margin': '1.25L', 'SP': '.80'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '3rd', 'Horse Entry Number': '6. ', 'Horse Full Name': 'Bertwhistle', 'Horse Barrier Number': ' (4)', 'Trainers': 'D.I.Dodson', 'Jockey': 'L.J.Neindorf', 'Gear': '0', 'WGT': '54kg', 'Price': ',150', '800m': '4th', '400m': '3rd', 'Margin': '4.75L', 'SP': '.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '4th', 'Horse Entry Number': '7. ', 'Horse Full Name': 'Flag Edition (NZ)', 'Horse Barrier Number': ' (2)', 'Trainers': 'M.Payne', 'Jockey': 'M.Payne', 'Gear': '0', 'WGT': '56kg', 'Price': ',750', '800m': '5th', '400m': '6th', 'Margin': '4.85L', 'SP': '.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '5th', 'Horse Entry Number': '8. ', 'Horse Full Name': 'Blandford Lad (NZ)', 'Horse Barrier Number': ' (3)', 'Trainers': 'P.Gelagotis', 'Jockey': 'W.T.Price', 'Gear': '2', 'WGT': '53kg', 'Price': ',050', '800m': '7th', '400m': '7th', 'Margin': '5.6L', 'SP': '.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '6th', 'Horse Entry Number': '4. ', 'Horse Full Name': 'South Pacific (GB)', 'Horse Barrier Number': ' (5)', 'Trainers': 'C.Maher & D.Eustace', 'Jockey': 'D.Oliver', 'Gear': '3', 'WGT': '57.5kg', 'Price': ',700', '800m': '2nd', '400m': '2nd', 'Margin': '5.8L', 'SP': '.95'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '7th', 'Horse Entry Number': '1. ', 'Horse Full Name': 'Home By Midnight (NZ)', 'Horse Barrier Number': ' (1)', 'Trainers': 'P.Payne', 'Jockey': 'T.J.Hope', 'Gear': '2', 'WGT': '60kg', 'Price':
',700', '800m': '3rd', '400m': '5th', 'Margin': '6.55L', 'SP': '.00'}
2021-09-08 22:58:36 [scrapy.core.scraper] DEBUG: Scraped from <200 https://www.racing.com/form/2020-08-08/flemington/race/1/results#/results>
{'Position': '\n ', 'Horse Entry Number': '2. ', 'Horse Full Name': 'Lord Belvedere (GB)', 'Horse Barrier Number': None, 'Trainers': 'C.Maher & D.Eustace', 'Jockey': 'B.J.Melham', 'Gear': '0', 'WGT': '60kg', 'Price': '–', '800m': None, '400m': None, 'Margin': None, 'SP': None}
2021-09-08 22:58:36 [scrapy.core.engine] INFO: Closing spider (finished)
2021-09-08 22:58:36 [scrapy.extensions.feedexport] INFO: Stored csv feed (10 items) in: data1.csv
2021-09-08 22:58:36 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 839,
'downloader/request_count': 1,
'downloader/request_method_count/POST': 1,
'downloader/response_bytes': 427762,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 23.855061,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 9, 8, 16, 58, 36, 640162),
'item_scraped_count': 10,
'log_count/DEBUG': 86,
'log_count/INFO': 13,
'log_count/WARNING': 3,
'response_received_count': 1,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'splash/execute/request_count': 1,
'splash/execute/response_count/200': 1,
'start_time': datetime.datetime(2021, 9, 8, 16, 58, 12, 785101)}
2021-09-08 22:58:36 [scrapy.core.engine] INFO: Spider closed (finished)
scrapy 无法在同一个响应中使用两个 yield 方法。
实际上,数据是从 API
调用 json 响应生成的。您可以通过后门生成数据轻松做到这一点,并且可以随心所欲地获取数据项。
这是工作解决方案的示例:
代码:
import scrapy
import json
class RaceSpider(scrapy.Spider):
name = 'race'
headers = {
'accept': 'application/json, text/plain, */*',
'accept-encoding': 'gzip, deflate, br',
'accept-language': 'en-US,en;q=0.9,bn;q=0.8,es;q=0.7,ar;q=0.6',
'origin': 'https://www.racing.com',
'referer': 'https://www.racing.com/',
'sec-ch-ua': '"Google Chrome";v="93", " Not;A Brand";v="99", "Chromium";v="93"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'sec-fetch-dest': 'empty',
'sec-fetch-mode': 'cors',
'sec-fetch-site': 'same-site',
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.63 Safari/537.36'}
def start_requests(self):
yield scrapy.Request(
url='https://api.racing.com/v1/en-au/meet/details/5162295/',
callback=self.parse,
method="GET",
headers=self.headers)
def parse(self, response):
response = json.loads(response.body)
for resp in response['raceCollection']:
for res in resp['raceResultsCollection']:
#print(resp)
items = {
'Race Number': resp['raceNumber'],
'Result Distance Thumbnail': resp['distance'],
'Title_name': resp['name'],
'Position':res ['barrierNumber'],
'Horse Full Name': res['horse']['fullName'],
'Jockey': res['jockey']['fullName']
}
yield items
输出:
{'Race Number': 1, 'Result Distance Thumbnail': 2530, 'Title_name': 'TAB Handicap', 'Position': 6, 'Horse Full Name': 'Double You Tee', 'Jockey': 'W.J.Egan'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 1, 'Result Distance Thumbnail': 2530, 'Title_name': 'TAB Handicap', 'Position': 4, 'Horse Full Name': 'Bertwhistle', 'Jockey': 'L.J.Neindorf'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 1, 'Result Distance Thumbnail': 2530, 'Title_name': 'TAB Handicap', 'Position': 2, 'Horse Full Name': 'Flag Edition (NZ)', 'Jockey': 'M.Payne'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 9, 'Result Distance Thumbnail': 1410, 'Title_name': 'Rubaroc Handicap', 'Position': 0, 'Horse Full Name': 'Honorable Mention (NZ)', 'Jockey': 'B.Allen'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 9, 'Result Distance Thumbnail': 1410, 'Title_name': 'Rubaroc Handicap', 'Position': 0, 'Horse Full Name': 'Copper Fox', 'Jockey': 'G.J.Cartwright'}
2021-09-09 23:22:22 [scrapy.core.scraper] DEBUG: Scraped from <200 https://api.racing.com/v1/en-au/meet/details/5162295/>
{'Race Number': 9, 'Result Distance Thumbnail': 1410, 'Title_name': 'Rubaroc Handicap', 'Position': 0, 'Horse Full Name': 'Muswellbrook', 'Jockey': 'J.Mott'}
2021-09-09 23:22:22 [scrapy.core.engine] INFO: Closing spider (finished)
2021-09-09 23:22:22 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 605,
'downloader/request_count': 1,
'downloader/request_method_count/GET': 1,
'downloader/response_bytes': 19582,
'downloader/response_count': 1,
'downloader/response_status_count/200': 1,
'elapsed_time_seconds': 5.144949,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2021, 9, 9, 17, 22, 22, 334263),
'httpcompression/response_bytes': 205617,
'httpcompression/response_count': 1,
'item_scraped_count': 100,
...等等