训练后更快地进行预测 [增加预测视频 fps]
Make prediction faster after training [ Increasing predicted video fps]
使用 mobilenetV3Large 训练了一个模型,该模型执行 分割 过程,但在预测时间,其 处理时间 不是那么好。 大约 FPS:3.95 .
我想至少达到 20fps。还附上示例代码。谢谢!
from imutils.video import VideoStream
from imutils.video import FPS
import numpy as np
import imutils
import time
import cv2
model = load_model('model.h5', custom_objects={'loss': loss, "dice_coefficient": dice_coefficient}, compile = False)
cap = VideoStream(src=0).start()
# warm up the camera for a couple of seconds
time.sleep(2.0)
# Start the FPS timer
fps = FPS().start()
while True:
frame = cap.read()
# Resize each frame
resized_image = cv2.resize(frame, (256, 256))
resized_image = tf.image.convert_image_dtype((resized_image/255.0), dtype=tf.float32).numpy()
mask = model.predict(np.expand_dims(resized_image[:,:,:3], axis=0))[0]
# show the output frame
cv2.imshow("Frame", mask)
key = cv2.waitKey(1) & 0xFF
# Press 'q' key to break the loop
if key == ord("q"):
break
# update the FPS counter
fps.update()
# stop the timer
fps.stop()
# Display FPS Information: Total Elapsed time and an approximate FPS over the entire video stream
print("[INFO] Elapsed Time: {:.2f}".format(fps.elapsed()))
print("[INFO] Approximate FPS: {:.2f}".format(fps.fps()))
# Destroy windows and cleanup
cv2.destroyAllWindows()
# Stop the video stream
cap.stop()
EDIT-1
进行 float16 量化后,将模型加载为 tflite_model 然后将输入(图像)输入模型。但是结果更慢!!这是正确的做法吗?
interpreter = tf.lite.Interpreter('tflite_model.tflite')
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
....................... process .............
while True:
............. process ............
interpreter.set_tensor(input_details[0]['index'], np.expand_dims(resized_image[:,:,:3], axis=0))
interpreter.invoke()
mask = interpreter.get_tensor(output_details[0]['index'])[0]
# mask = model.predict(np.expand_dims(resized_image[:,:,:3], axis=0))[0]
............ display part ........
可以通过不同的方式使速度更快:
模型量化:
TensorFlow Lite 支持将权重转换为 16 位浮点数
也许这是将模型保存在
中的最简单方法
tf.float16
或者用 float16 或 float8 重新训练会更快
https://www.tensorflow.org/lite/performance/post_training_float16_quant
模型蒸馏:
您可以训练小模型,该模型将使用模型的损失函数进行训练,并且可以从大模型中学习所有内容。
模型剪枝
你可以通过修剪来压缩你的模型,这样会更快。关于修剪你也可以阅读tensorflow文档
使用 mobilenetV3Large 训练了一个模型,该模型执行 分割 过程,但在预测时间,其 处理时间 不是那么好。 大约 FPS:3.95 .
我想至少达到 20fps。还附上示例代码。谢谢!
from imutils.video import VideoStream
from imutils.video import FPS
import numpy as np
import imutils
import time
import cv2
model = load_model('model.h5', custom_objects={'loss': loss, "dice_coefficient": dice_coefficient}, compile = False)
cap = VideoStream(src=0).start()
# warm up the camera for a couple of seconds
time.sleep(2.0)
# Start the FPS timer
fps = FPS().start()
while True:
frame = cap.read()
# Resize each frame
resized_image = cv2.resize(frame, (256, 256))
resized_image = tf.image.convert_image_dtype((resized_image/255.0), dtype=tf.float32).numpy()
mask = model.predict(np.expand_dims(resized_image[:,:,:3], axis=0))[0]
# show the output frame
cv2.imshow("Frame", mask)
key = cv2.waitKey(1) & 0xFF
# Press 'q' key to break the loop
if key == ord("q"):
break
# update the FPS counter
fps.update()
# stop the timer
fps.stop()
# Display FPS Information: Total Elapsed time and an approximate FPS over the entire video stream
print("[INFO] Elapsed Time: {:.2f}".format(fps.elapsed()))
print("[INFO] Approximate FPS: {:.2f}".format(fps.fps()))
# Destroy windows and cleanup
cv2.destroyAllWindows()
# Stop the video stream
cap.stop()
EDIT-1
进行 float16 量化后,将模型加载为 tflite_model 然后将输入(图像)输入模型。但是结果更慢!!这是正确的做法吗?
interpreter = tf.lite.Interpreter('tflite_model.tflite')
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
....................... process .............
while True:
............. process ............
interpreter.set_tensor(input_details[0]['index'], np.expand_dims(resized_image[:,:,:3], axis=0))
interpreter.invoke()
mask = interpreter.get_tensor(output_details[0]['index'])[0]
# mask = model.predict(np.expand_dims(resized_image[:,:,:3], axis=0))[0]
............ display part ........
可以通过不同的方式使速度更快:
模型量化:
TensorFlow Lite 支持将权重转换为 16 位浮点数 也许这是将模型保存在
中的最简单方法tf.float16
或者用 float16 或 float8 重新训练会更快 https://www.tensorflow.org/lite/performance/post_training_float16_quant
模型蒸馏: 您可以训练小模型,该模型将使用模型的损失函数进行训练,并且可以从大模型中学习所有内容。
模型剪枝 你可以通过修剪来压缩你的模型,这样会更快。关于修剪你也可以阅读tensorflow文档