将鼠标事件函数与张量流一起用于对象检测时列出索引超出范围错误

Question

我正在使用 Tensorflow 进行对象检测。我成功地训练了神经网络，它可以在直播中检测到我想要检测的对象。它通过在对象周围制作一个边界框来做到这一点。

现在我想标记视频帧中的区域（最初），这样如果对象进入标记区域并被检测到，（即如果在标记区域中制作了边界框）那么我想打印终端中的消息。

为此，我正在使用 OpenCV。我找到了一个很好的教程，介绍如何使用鼠标回调函数来执行此操作。 link 如下所示。

https://www.pyimagesearch.com/2015/03/09/capturing-mouse-click-events-with-python-and-opencv/

但是当我执行我的代码时出现错误。错误如下所示。

 ---------------------------------------------------------------------------
 IndexError                                Traceback (most recent call last)
<ipython-input-1-10159c26292b> in click_and_crop(event, x, y, flags, params)
    201           refPt.append((x,y))
    202           cropping = False
--> 203           cv2.rectangle(image_np,refPt[0],refPt[1],(0,255,0),2)
    204       ret, image_np = cap.read()
    205       # Expand dimensions since the model expects images to have 
shape: [1, None, None, 3]

IndexError: list index out of range

我的主要程序如下：

import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

from collections import defaultdict
from io import StringIO
from PIL import Image

import cv2
cap = cv2.VideoCapture(0)


# This is needed since the notebook is stored in the object_detection 
folder.
sys.path.append("..")

from object_detection.utils import ops as utils_ops

if tf.__version__ < '1.4.0' and tf.__version__ != '1.10.0':
raise ImportError('Please upgrade your tensorflow installation to v1.4.* or 
later!')


# ## Env setup

# In[3]:


# This is needed to display the images.
#get_ipython().run_line_magic('matplotlib', 'inline')


# ## Object detection imports
# Here are the imports from the object detection module.

# In[5]:


from utils import label_map_util

from utils import visualization_utils as vis_util


# # Model preparation 

# ## Variables
# 
# Any model exported using the `export_inference_graph.py` tool can be 
loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb 
file.  
# 


# In[6]:


# What model to download.
MODEL_NAME = 'car_inference_graph'

# Path to frozen detection graph. This is the actual model that is used for 
the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('training', 'object-detection.pbtxt')

NUM_CLASSES = 1


# ## Download Model

# ## Load a (frozen) Tensorflow model into memory.

# In[7]:


detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
  serialized_graph = fid.read()
  od_graph_def.ParseFromString(serialized_graph)
  tf.import_graph_def(od_graph_def, name='')


# ## Loading label map
# Label maps map indices to category names, so that when our convolution 
network predicts `5`, we know that this corresponds to `airplane`.  Here we 
use internal utility functions, but anything that returns a dictionary 
mapping integers to appropriate string labels would be fine

# In[8]:


label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, 
max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)


# ## Helper code

# In[9]:


def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
  (im_height, im_width, 3)).astype(np.uint8)


# # Detection

# In[10]:


# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images 
to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 
'image{}.jpg'.format(i)) for i in range(1,44) ]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)


# In[11]:


def run_inference_for_single_image(image, graph):
  with graph.as_default():
  with tf.Session() as sess:
    # Get handles to input and output tensors
    ops = tf.get_default_graph().get_operations()
    all_tensor_names = {output.name for op in ops for output in op.outputs}
    tensor_dict = {}
    for key in [
      'num_detections', 'detection_boxes', 'detection_scores',
      'detection_classes', 'detection_masks'
  ]:
    tensor_name = key + ':0'
    if tensor_name in all_tensor_names:
      tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
          tensor_name)
    if 'detection_masks' in tensor_dict:
      # The following processing is only for single image
      detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
      detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
      # Reframe is required to translate mask from box coordinates to image 
  coordinates and fit the image size.
      real_num_detection = tf.cast(tensor_dict['num_detections'][0], 
  tf.int32)
      detection_boxes = tf.slice(detection_boxes, [0, 0], 
  [real_num_detection, -1])
      detection_masks = tf.slice(detection_masks, [0, 0, 0], 
  [real_num_detection, -1, -1])
      detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
          detection_masks, detection_boxes, image.shape[0], image.shape[1])
      detection_masks_reframed = tf.cast(
          tf.greater(detection_masks_reframed, 0.5), tf.uint8)
      # Follow the convention by adding back the batch dimension
      tensor_dict['detection_masks'] = tf.expand_dims(
          detection_masks_reframed, 0)
    image_tensor = 
  tf.get_default_graph().get_tensor_by_name('image_tensor:0')

    # Run inference
    output_dict = sess.run(tensor_dict,
                         feed_dict={image_tensor: np.expand_dims(image, 0)})

    # all outputs are float32 numpy arrays, so convert types as appropriate
    output_dict['num_detections'] = int(output_dict['num_detections'][0])
    output_dict['detection_classes'] = output_dict[
        'detection_classes'][0].astype(np.uint8)
    output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
    output_dict['detection_scores'] = output_dict['detection_scores'][0]
    if 'detection_masks' in output_dict:
      output_dict['detection_masks'] = output_dict['detection_masks'][0]
return output_dict


# In[12]:


# In[10]:

with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
    while True:
      refPt = [] #ROI code starts from here
      cropping = False

    def click_and_crop(event,x,y,flags,params):
      global refPt,cropping

      if event == cv2.EVENT_LBUTTONDOWN:
        refPt = [(x,y)]
        cropping = True

      elif event == cv2.EVENT_LBUTTONUP:
        refPt.append((x,y))
        cropping = False
        cv2.rectangle(image_np,refPt[0],refPt[1],(0,255,0),2) # ROI code end
    ret, image_np = cap.read()
    # Expand dimensions since the model expects images to have shape: [1, 
  None, None, 3]
    image_np_expanded = np.expand_dims(image_np, axis=0)
    image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
    # Each box represents a part of the image where a particular object was 
  detected.
    boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
    # Each score represent how level of confidence for each of the objects.
    # Score is shown on the result image, together with the class label.
    scores = detection_graph.get_tensor_by_name('detection_scores:0')
    classes = detection_graph.get_tensor_by_name('detection_classes:0')
    num_detections = detection_graph.get_tensor_by_name('num_detections:0')
    # Actual detection.
    (boxes, scores, classes, num_detections) = sess.run(
        [boxes, scores, classes, num_detections],
        feed_dict={image_tensor: image_np_expanded})
    # Visualization of the results of a detection.
    vis_util.visualize_boxes_and_labels_on_image_array(
        image_np,
        np.squeeze(boxes),
        np.squeeze(classes).astype(np.int32),
        np.squeeze(scores),
        category_index,
        use_normalized_coordinates=True,
        line_thickness=8)

    cv2.imshow("object detection", image_np)
    cv2.setMouseCallback("object detection", click_and_crop)

    if cv2.waitKey(25) & 0xFF == ord('q'):
      cv2.destroyAllWindows()
      cap.release()
      break

使用此代码：

能正常看到直播
成功检测到所需对象。
但是当我拖动鼠标左键在框架中绘制ROI时，我得到上面提到的错误。

我知道这与 refPt[0],refPt[1] 有关，但我不知道在哪里进行必要的更改！

技术资料：

张量流 1.10
OS - Ubuntu 18.04
Python 3.6
OpenCV 3.4.2

请帮忙。

谢谢:)

Answer 1

你的代码有几个问题，但基本上可以在你使用的变量范围内概括问题。

避免在循环中创建一个函数...这将每次都重新定义一个函数...这里不是问题，但最好不要这样做。
while 内部有一个 refPt = []，这将在每次迭代时清空数组...如案例 1，它应该在外部。无论如何，您在函数 refPt = [(x,y)] 中拥有将删除旧值并“清理”变量的功能。
在函数内您有 cv2.rectangle(image_np,refPt[0],refPt[1],(0,255,0),2) 更改 image_np，但此图像在本地而不是全局更改。
在循环中，您有 ret, image_np = cap.read()，它将几乎立即删除任何矩形而不显示....您需要在新图像上绘制矩形。类似于：

 ret, image_np = cap.read()
 # if no image was obtained quit the loop
 if !ret:
   break
 tmpPt = refPt.copy() # to avoid it being changed in the callback
 if len(tmpPt ) ==2:
   cv2.rectangle(image_np,tmpPt [0],tmpPt [1],(0,255,0),2)

建议在循环外使用cv2.setMouseCallback("object detection", click_and_crop)...您可以使用cv2.namedWindow("object detection")创建window而无需图像。

这些是我看到的问题，一旦这些被更正，也许你会遇到更多...还有一点，你只是画了一个矩形，但我没有看到你真的用它来 select 一个roi（将图片裁剪成矩形大小），不知道是不是故意的...

希望对您有所帮助，如有疑问，请在评论中提问。

更新

为了让自己清楚一点，先添加 selection，然后再添加检测部分，代码应如下所示：

refPt = [] 
cropping = False

def click_and_crop(event,x,y,flags,params):
  global refPt,cropping

  if event == cv2.EVENT_LBUTTONDOWN:
    refPt = [(x,y)]
    cropping = True

  elif event == cv2.EVENT_LBUTTONUP:
    refPt.append((x,y))
    cropping = False

cv2.namedWindow("object detection")
cv2.setMouseCallback("object detection", click_and_crop)

detect = False
with detection_graph.as_default():
  with tf.Session(graph=detection_graph) as sess:
    while True:
     ret, image_np = cap.read()
     # if no image was obtained quit the loop
     if !ret:
       break
     tmpPt = refPt.copy() # to avoid it being changed in the callback
     if len(tmpPt ) ==2:
       cv2.rectangle(image_np,tmpPt [0],tmpPt [1],(0,255,0),2)

    if detect:
      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image_np, axis=0)
      image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
      # Each box represents a part of the image where a particular object was detected.
      boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
      # Each score represent how level of confidence for each of the objects.
      # Score is shown on the result image, together with the class label.
      scores = detection_graph.get_tensor_by_name('detection_scores:0')
      classes = detection_graph.get_tensor_by_name('detection_classes:0')
      num_detections = detection_graph.get_tensor_by_name('num_detections:0')
      # Actual detection.
      (boxes, scores, classes, num_detections) = sess.run(
          [boxes, scores, classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
      # Visualization of the results of a detection.
      vis_util.visualize_boxes_and_labels_on_image_array(
          image_np,
          np.squeeze(boxes),
          np.squeeze(classes).astype(np.int32),
          np.squeeze(scores),
          category_index,
          use_normalized_coordinates=True,
          line_thickness=8)

    cv2.imshow("object detection", image_np)

    key = cv2.waitKey(25) & 0xFF 
    if key == ord('q'):
      cv2.destroyAllWindows()
      cap.release()
      break
    elif key == ord('s'):
      detect = True # start detecting

再一次，这只会绘制矩形...它不会裁剪

将鼠标事件函数与张量流一起用于对象检测时列出索引超出范围错误

List index out of range error when using mouse events functions with tensorflow for object detection

python

opencv

mouseevent

tensorflow

object-detection-api

更新