使用对象检测张量流对录制视频进行预测 API

Question

我正在尝试读取视频文件（使用 opencv），使用 tensorflow 的对象检测 API 遍历所有帧以进行预测和边界框，并将预测帧（带框）写入一个新的视频文件。我使用 object_detection_tutorial.ipynb 进行了一些修改来捕获视频帧，并在从冻结图（经过训练）加载的 faster-rcnn-inception-resnet-v2 中对其进行处理。

我在具有 windows 10 和 56GB 内存的云计算机中使用 tesla P100 GPU。同样使用tensorflow-gpu。

当我运行代码时，每帧需要0.5秒。这是特斯拉 P100 的正常速度还是我在代码中做错了什么使其变慢？

此代码只是一个测试，稍后我将不得不在实时视频预测任务中使用它。如果每帧 0.5 秒是使用 tensorflow API 的预期速度，我想我将无法在我的任务中使用它:(

所以，在运行ning 之后，我得到以下运行ning 次

处理帧数1.0

捕获视频帧的时间 0.0

预测时间 0.49225664138793945

在一帧中生成框的时间 0.14833950996398926

在视频文件中写入一帧的时间 0.04687023162841797

循环总时间 0.6874663829803467

如你们所见，使用 CPU (opencv) 的代码速度很快。但是当我使用 GPU 时，仅在预测任务中就需要 almost 0.5 秒（用于 sess.run）。

有什么建议吗？先感谢您。波纹管遵循我的代码

从 distutils.version 导入 StrictVersion 将 numpy 导入为 np 进口 os 导入 six.moves.urllib 作为 urllib 导入系统导入 tarfile 将 tensorflow 导入为 tf 导入压缩文件导入时间

from collections import defaultdict
from io import StringIO
#from matplotlib import pyplot as plt
from PIL import Image

import cv2
from imutils import paths

import re

#This is needed since the code is stored in the object_detection    folder.
sys.path.append("..")
from object_detection.utils import ops as utils_ops

if StrictVersion(tf.__version__) < StrictVersion('1.9.0'):
  raise ImportError('Please upgrade your TensorFlow installation to v1.9.* or later!')


from utils import label_map_util

from utils import visualization_utils as vis_util

#Detection using tensorflow inside write_video function

def write_video():

    filename = 'output/teste_v2.avi'
    codec = cv2.VideoWriter_fourcc('W', 'M', 'V', '2')
    cap = cv2.VideoCapture('pneu_trim2.mp4')
    framerate = round(cap.get(5),2)
    w = int(cap.get(3))
    h = int(cap.get(4))
    resolution = (w, h)

    VideoFileOutput = cv2.VideoWriter(filename, codec, framerate, resolution)    

    ################################
    # # Model preparation 

    # ## Variables
    # 
    # Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file.  
    # 


    # What model to download.
    MODEL_NAME = 'training/pneu_incep_step_24887'
    print("loading model from " + MODEL_NAME)

    # Path to frozen detection graph. This is the actual model that is used for the object detection.
    PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'

    # List of the strings that is used to add correct label for each box.
    PATH_TO_LABELS = os.path.join('data', 'object-detection.pbtxt')

    NUM_CLASSES = 5


    # ## Load a (frozen) Tensorflow model into memory.

    time_graph = time.time()
    print('loading graphs')
    detection_graph = tf.Graph()
    with detection_graph.as_default():
      od_graph_def = tf.GraphDef()
      with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
        serialized_graph = fid.read()
        od_graph_def.ParseFromString(serialized_graph)
        tf.import_graph_def(od_graph_def, name='')
    print("tempo build graph = " + str(time.time() - time_graph))

    # ## Loading label map

    label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
    categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
    category_index = label_map_util.create_category_index(categories)

    ################################

    with tf.Session(graph=detection_graph) as sess:
        with detection_graph.as_default():
            while (cap.isOpened()):
              time_loop = time.time()
              print('processing frame number: ' + str(cap.get(1)))
              time_captureframe = time.time()
              ret, image_np = cap.read()
              print("time to capture video frame = " + str(time.time() - time_captureframe))
              if (ret != True):
                  break
              # the array based representation of the image will be used later in order to prepare the
              # result image with boxes and labels on it.
              #image_np = load_image_into_numpy_array(image)
              # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
              image_np_expanded = np.expand_dims(image_np, axis=0)
              image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
              # Each box represents a part of the image where a particular object was detected.
              boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
              # Each score represent how level of confidence for each of the objects.
              # Score is shown on the result image, together with the class label.
              scores = detection_graph.get_tensor_by_name('detection_scores:0')
              classes = detection_graph.get_tensor_by_name('detection_classes:0')
              num_detections = detection_graph.get_tensor_by_name('num_detections:0')
              # Actual detection.
              time_prediction = time.time()
              (boxes, scores, classes, num_detections) = sess.run(
                  [boxes, scores, classes, num_detections],
                  feed_dict={image_tensor: image_np_expanded})
              print("time to predict = " + str(time.time() - time_prediction))
              # Visualization of the results of a detection.
              time_visualizeboxes = time.time()
              vis_util.visualize_boxes_and_labels_on_image_array(
                  image_np,
                  np.squeeze(boxes),
                  np.squeeze(classes).astype(np.int32),
                  np.squeeze(scores),
                  category_index,
                  use_normalized_coordinates=True,
                  line_thickness=8)
              print("time to generate boxes in a frame = " + str(time.time() - time_visualizeboxes))


              time_writeframe = time.time()
              VideoFileOutput.write(image_np)
              print("time to write a frame in video file = " + str(time.time() - time_writeframe))

              print("total time in the loop = " + str(time.time() - time_loop))

    cap.release()
    VideoFileOutput.release()
    print('done')

Answer 1

实际上问题出在您使用的模型上。 https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md 基本上模型 Faster-rcnn-inception-resnet-v2 将花费更多时间。您可以参考link了解模型

的速度

使用对象检测张量流对录制视频进行预测 API

Predictions in recorded video using object detection tensorflow API

object-detection

tensorflow

object-detection-api