如何在 scrapy 中查找请求何时开始以及何时结束
How to find when a request had started and when it got ended in scrapy
我正在尝试在 scrapy 中测量系统的吞吐量,并且我试图在 scrapy 中找出 HTTP 请求何时被触发以及何时完成。
非常感谢任何找到解决方案的指导。
您可以使用自定义中间件:
class MeasureMiddleware:
requests = []
def process_request(self, request, spider):
# store the time and url of every outgoing request
self.requests.append((request.url, datetime.now()))
def process_response(self, request, response, spider):
# for everyone response check if one of tracked requests cameback
# if so, print start time and current time
filtered_requests = []
# go through tracked requests and check whether any of them match current url
for request in self.requests:
url, start_date = request
if url == request.url:
logging.info(f'request {url} {start_date} - {datetime.now()}')
else:
filtered_requests.append(request)
self.requests = filtered_requests
然后激活下载器中间件
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.MeasureMiddleware': 543,
}
值得注意的是,由于 scrapy 的异步特性,它不会精确到 ms,但它应该足够精确以给出一个通用的概述。
我正在尝试在 scrapy 中测量系统的吞吐量,并且我试图在 scrapy 中找出 HTTP 请求何时被触发以及何时完成。
非常感谢任何找到解决方案的指导。
您可以使用自定义中间件:
class MeasureMiddleware:
requests = []
def process_request(self, request, spider):
# store the time and url of every outgoing request
self.requests.append((request.url, datetime.now()))
def process_response(self, request, response, spider):
# for everyone response check if one of tracked requests cameback
# if so, print start time and current time
filtered_requests = []
# go through tracked requests and check whether any of them match current url
for request in self.requests:
url, start_date = request
if url == request.url:
logging.info(f'request {url} {start_date} - {datetime.now()}')
else:
filtered_requests.append(request)
self.requests = filtered_requests
然后激活下载器中间件
DOWNLOADER_MIDDLEWARES = {
'myproject.middlewares.MeasureMiddleware': 543,
}
值得注意的是,由于 scrapy 的异步特性,它不会精确到 ms,但它应该足够精确以给出一个通用的概述。