将对象检测与文本转语音代码相结合

Question

我正在尝试编写对象检测 + 文本到语音转换代码来检测对象并在 raspberry pi 4 上生成语音输出。但是，截至目前，我正在尝试编写一个简单的 python 脚本，它将两个元素合并到一个 .py 文件中，最好是作为一个函数。然后我将运行这个脚本放在raspberry pi。我想感谢 Murtaza 的研讨会“Object Detection OpenCV Python | Easy and Fast (2020)”和 https://pypi.org/project/pyttsx3/ 的 pyttsx3 文本转语音文档。我附上了下面的代码。我已经尝试运行ning 程序，但我总是不断收到文本到语音代码的错误（注释行 33-36 以供参考）。我相信这是一些循环错误，但我似乎无法让程序连续运行。例如，如果我运行没有 TTS 部分的代码，它工作正常。否则，它运行s 可能持续 3-5 秒然后突然停止。我是一名初学者，但对计算机视觉充满热情，非常感谢任何帮助！

import cv2
#import pyttsx3

cap = cv2.VideoCapture(0)
cap.set(3, 640)
cap.set(4, 480)

classNames = []
classFile = 'coco.names'
with open(classFile,'rt') as f:
    classNames = [line.rstrip() for line in f]

configPath = 'ssd_mobilenet_v3_large_coco_2020_01_14.pbtxt'
weightsPath = 'frozen_inference_graph.pb'

net = cv2.dnn_DetectionModel(weightsPath, configPath)
net.setInputSize(320, 320)
net.setInputScale(1.0 / 127.5)
net.setInputMean((127.5, 127.5, 127.5))
net.setInputSwapRB(True)

while True:
    success, img = cap.read()
    classIds, confs, bbox = net.detect(img, confThreshold=0.45)
    if len(classIds) != 0:
        for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
            className = classNames[classId-1]
            #engine = pyttsx3.init()
            #str1 = str(className)
            #engine.say(str1 + "detected")
            #engine.runAndWait()
            cv2.rectangle(img, box, color=(0, 255, 0), thickness=2)
            cv2.putText(img, classNames[classId-1].upper(), (box[0]+10, box[1]+30),
                cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
            cv2.putText(img, str(round(confidence * 100, 2)), (box[0]+200, box[1]+30),
                cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
    cv2.imshow('Output', img)
    cv2.waitKey(1)

这是我的代码的截图1

Here is a link to the download files needed to run code as well in case

这是错误：/Users/venuchannarayappa/PycharmProjects/ObjectDetector/venv/bin/python /Users/venuchannarayappa/PycharmProjects/ObjectDetector/main.py

回溯（最后一次调用）：文件“/Users/venuchannarayappa/PycharmProjects/ObjectDetector/main.py”，第 24 行，在

中

classIds, confs, bbox = net.detect(img, confThreshold=0.45)

cv2.error: OpenCV(4.5.4) /Users/runner/work/opencv-python/opencv-python/opencv/modules/imgproc/src/resize.cpp:4051: 错误: (-215:Assertion failed) !ssize.empty() in function 'resize'

进程已完成，退出代码为 1

Link到通过iphone录制的视频输出：https://www.icloud.com/iclouddrive/03jGfqy7-A9DKfekcu3wjk0rA#IMG_4932

抱歉拖了这么久post！在过去的几个小时里，我一直在调试我的代码，我想我已经开始工作了。我只更改了主 while 循环，其余代码是相同的。该程序似乎运行不断为我服务。如果在运行中遇到任何困难，我将不胜感激。

engine = pyttsx3.init()
while True:
    success, img = cap.read()
    #print(success)
    #print(img)
    #print(img.shape)
    classIds, confs, bbox = net.detect(img, confThreshold=0.45)
    if len(classIds) != 0:
        for classId, confidence, box in zip(classIds.flatten(), confs.flatten(), bbox):
            className = classNames[classId - 1]
            #print(len(classIds))
            str1 = str(className)
            #print(str1)
            engine.say(str1 + "detected")
            engine.runAndWait()
            cv2.rectangle(img, box, color=(0, 255, 0), thickness=2)
            cv2.putText(img, classNames[classId-1].upper(), (box[0]+10, box[1]+30),
                cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
            cv2.putText(img, str(round(confidence * 100, 2)), (box[0]+200, box[1]+30),
                cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 2)
        continue
    cv2.imshow('Output', img)
    cv2.waitKey(1)

我打算运行将此代码放在 raspberry pi 上。我计划使用此命令安装 opencv：pip3 install opencv-python。但是，我不确定如何安装 pyttsx3，因为我认为我需要从源代码安装。如果有安装pyttsx3的简单方法请告诉我。

更新：截至 12 月 27 日，我已经安装了所有必要的软件包，我的代码现在可以正常运行了。

Answer 1

我在 Raspberry Pi 上使用终端中的两个命令安装了 pyttsx3:

sudo apt update && sudo apt install espeak ffmpeg libespeak1
pip 安装 pyttsx3

我按照视频 youtube.com/watch?v=AWhDDl-7Iis&ab_channel=AiPhile 安装 pyttsx3。我的功能代码也应该在上面列出。我的问题应该得到解决，但希望对任何想编写类似程序的人都有用。我对我的代码做了一些小的调整。

将对象检测与文本转语音代码相结合

Combining Object Detection with Text to Speech Code

python

text-to-speech

object-detection

pyttsx3