在视频上应用 yolo 项目时的边界框挑战(来自 coursera)

Bounding box challenges while applying yolo project on videos (from coursera)

我试图在汽车检测程序结束时处理视频文件而不是图像,来自 CNN 的 coursera 课程。不幸的是,边界框与实际汽车位置不同步,并且在 X 轴和 Y 轴上偏移了几个点......在我看来,当我冻结时,frame_width 和高度搞乱了某个地方a 'currentFrame' 并将其提供给预处理(如果有的话)。关于这里可能出什么问题的任何想法?不想粘贴整个项目代码,所以我只粘贴了用代码替换预测函数以迭代到视频帧的部分。

import cv2 
from tqdm import tqdm
import imghdr 
from numpy import expand_dims
from keras.preprocessing.image import img_to_array

video_out = ("nb_images/out1.mp4")
video_reader = cv2.VideoCapture("nb_images/road_video_trim2.mp4")
nb_frames = int(video_reader.get(cv2.CAP_PROP_FRAME_COUNT))
frame_h = int(video_reader.get(cv2.CAP_PROP_FRAME_HEIGHT))
frame_w = int(video_reader.get(cv2.CAP_PROP_FRAME_WIDTH))        
video_writer = cv2.VideoWriter(video_out,
                       cv2.VideoWriter_fourcc(*'MPEG'), 
                       50.0, 
                       (frame_w, frame_h))

batch_size  = 1
images      = []
start_point = 0 #%
show_window = False
for i in tqdm(range(nb_frames)):
    _, image = video_reader.read()
    #blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (frame_w, frame_h), swapRB=True, crop=False)    
    cv2.imwrite("currentFrame.jpg", image)
    image, image_data = preprocess_image("currentFrame.jpg", model_image_size = (608, 608))
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict={yolo_model.input:image_data, K.learning_phase():0})

    #out_scores, out_boxes, out_classes, output_image = predict2(sess,"currentFrame.jpg")
    colors = generate_colors(class_names)
    #draw_boxes(img, out_scores, out_boxes, out_classes, class_names, colors)

    draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
    #video_writer.write(images[i]) 
    imshow(image)
    video_writer.write(np.uint8(image))

images = []
if show_window: cv2.destroyAllWindows()
video_reader.release()
video_writer.release()       
#else: # do detection on an image or a set of images
image_paths = []

所以,我想出了这里出了什么问题......这个进行初始化的代码片段有不同的图像形状。首先我改变了它。

class_names = read_classes("model_data/coco_classes.txt")
anchors = read_anchors("model_data/yolo_anchors.txt")
#image_shape = (720., 1280.)
image_shape=(608., 608.)

然后yolo调用...

yolo_outputs = yolo_head(yolo_model.output, anchors, len(class_names))
scores, boxes, classes = yolo_eval(yolo_outputs, image_shape)

现在 - 在我的代码中我做了这些小改动......

for i in tqdm(range(nb_frames)):
    _, image = video_reader.read()
    #blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (frame_w, frame_h), swapRB=True, crop=False)    
    image = cv2.resize(image, (608, 608))
    cv2.imwrite("currentFrame.jpg", image)
    image, image_data = preprocess_image("currentFrame.jpg", model_image_size = (608, 608))
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict={yolo_model.input:image_data, K.learning_phase():0})
    colors = generate_colors(class_names)    
    draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
    image = cv2.resize(np.array(image), (frame_w,frame_h))
    video_writer.write(np.uint8(image))
    imshow(image)

我认为形状初始化为 608,608 和上面的调整大小是它起作用的原因。最后的框架就这样出来了。 finalFrame

这里只是逻辑上的闭环。

答案是我上面编辑的消息的后半部分。从我开始的地方 - 我想出了什么问题。准确地说,我必须

class_names = read_classes("model_data/coco_classes.txt")
anchors = read_anchors("model_data/yolo_anchors.txt")
#image_shape = (720., 1280.)
image_shape=(608., 608.)

然后我不得不在将其发送进行处理之前以及在将其作为更新视频的一部分写回之前“调整大小”。我看到的其他代码片段实际上没有这个,所以我真的不知道为什么我需要这个“补丁”修复。这有效:)

for i in tqdm(range(nb_frames)):
    _, image = video_reader.read()
    #blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (frame_w, frame_h), swapRB=True, crop=False)    
    image = cv2.resize(image, (608, 608))
    cv2.imwrite("currentFrame.jpg", image)
    image, image_data = preprocess_image("currentFrame.jpg", model_image_size = (608, 608))
    out_scores, out_boxes, out_classes = sess.run([scores, boxes, classes], feed_dict={yolo_model.input:image_data, K.learning_phase():0})
    colors = generate_colors(class_names)    
    draw_boxes(image, out_scores, out_boxes, out_classes, class_names, colors)
    image = cv2.resize(np.array(image), (frame_w,frame_h))
    video_writer.write(np.uint8(image))
    imshow(image)