使用 websocket 从 h.264 视频流中捕获第一张图像 - Python

Question

我正在尝试从我的 Raspberry Pi 中的 H.264 视频流中捕获单个图像。流媒体正在使用 raspivid 和 websocket。但是，无法在 imshow() 中显示正确的图像。我也尝试设置 .reshape()，但得到 ValueError: cannot reshape array of size 3607 into shape (480,640,3)

在客户端，我成功连接到视频流并获取传入字节。服务器正在使用 raspivid-broadcaster 进行视频流式传输。我想第一个字节可以解码为图像？所以，我执行以下代码。

async def get_image_from_h264_streaming():

    uri = "ws://127.0.0.1:8080"
    async with websockets.connect(uri) as websocket:
        frame = json.loads(await websocket.recv())

        print(frame)
        width, height = frame["width"], frame["height"]

        response = await websocket.recv()
        print(response)

        # transform the byte read into a numpy array
        in_frame = (
            numpy
            .frombuffer(response, numpy.uint8)
            # .reshape([height, width, 3])
        )

        # #Display the frame
        cv2.imshow('in_frame', in_frame)

        cv2.waitKey(0)

asyncio.get_event_loop().run_until_complete(get_image_from_h264_streaming())

打印（帧）显示

{'action': 'init', 'width': 640, 'height': 480}

打印（响应）显示

b"\x00\x00\x00\x01'B\x80(\x95\xa0(\x0fh\x0..............xfc\x9f\xff\xf9?\xff\xf2\x7f\xff\xe4\x80"

有什么建议吗？

-------------------------------- 编辑 ----- ----------------------------

感谢。这是我更新的代码。

def decode(raw_bytes: bytes):
    code_ctx = av.CodecContext.create("h264", "r")
    packets = code_ctx.parse(raw_bytes)
    for i, packet in enumerate(packets):
        frames = code_ctx.decode(packet)
        if frames:
            return frames[0].to_ndarray() 

async def save_img():
    async with websockets.connect("ws://127.0.0.1:8080") as websocket:
        image_init = await websocket.recv()

        count = 0
        combined = b''

        while count < 3:
            response = await websocket.recv()
            combined += response
            count += 1

        frame = decode(combined)
        print(frame)

        cv2.imwrite('test.jpg', frame)

asyncio.get_event_loop().run_until_complete(save_img())

print(frame) 显示

[[109 109 109 ... 115  97 236]
 [109 109 109 ... 115  97 236]
 [108 108 108 ... 115  97 236]
 ...
 [111 111 111 ... 101 103 107]
 [110 110 110 ... 101 103 107]
 [112 112 112 ... 104 106 110]]

下面是我得到的保存图片。它的尺寸错误，为 740（高）x640（宽）。正确的是 480（高）x 640（宽）。而且，不确定为什么图像是灰度而不是彩色图像。

-------------------------------- 编辑 2 ---- ------------------------------

下面是raspivid.

发送数据的主要方法

raspivid - index.js

const {port, ...raspividOptions} = {...options, profile: 'baseline', timeout: 0};
videoStream = raspivid(raspividOptions)
    .pipe(new Splitter(NALSeparator))
    .pipe(new stream.Transform({
        transform: function (chunk, _encoding, callback){
            ...
            callback();
        }
    }));

videoStream.on('data', (data) => {
    wsServer.clients.forEach((socket) => {
        socket.send(data, {binary: true});
    });
});

stream-split - index.js（一行代码显示最大大小为1Mb）

class Splitter extends Transform {

  constructor(separator, options) {
    ...
    this.bufferSize  = options.bufferSize  || 1024 * 1024 * 1  ; //1Mb
    ...
  }

  _transform(chunk, encoding, next) {

    if (this.offset + chunk.length > this.bufferSize - this.bufferFlush) {
        var minimalLength = this.bufferSize - this.bodyOffset + chunk.length;
        if(this.bufferSize < minimalLength) {
          //console.warn("Increasing buffer size to ", minimalLength);
          this.bufferSize = minimalLength;
        }
          
        var tmp = new Buffer(this.bufferSize);
        this.buffer.copy(tmp, 0, this.bodyOffset);
        this.buffer = tmp;
        this.offset = this.offset - this.bodyOffset;
        this.bodyOffset = 0;
    }
    ...
  }
};

------------完成答案（感谢 Ann 和 Christoph 的指导）--------

请参阅答案部分。

Answer 1

这是我在尝试通过套接字发送 numpy 图像 （转换为字节） 时遇到的问题。问题是字节串太长了。

因此，我没有一次发送整个图像，而是将图像切片，这样我就必须发送，比如说，10 个图像切片。一旦另一端收到 10 片，只需将它们堆叠在一起。

请记住，根据图像的大小，您可能需要或多或少地对它们进行切片以获得最佳结果（效率，无错误）.

Answer 2

一个问题，frame/stream是如何通过websocket传输的？字节序列看起来像一个最终单位，它可以是 PPS 或 SPS 等。你怎么知道它是一个 IFrame 例如，我不知道如果 cv2.imshow 支持 RAW H264。查看 pyav，你可以打开 h264 原始字节然后你可以尝试从中提取一帧:) 如果你需要 pyav 的帮助，请告诉我，看看这个 post 有一个例子，你可以怎么做。

更新

根据您的评论，您需要一种方法来解析和解码原始 h264 流，下面是一个函数，可以让你了解它，你需要将你从 websocket 接收到的字节传递给这个函数，注意需要足够的数据来提取一帧。

pip install av

PyAV docs

import av

# Feed in your raw bytes from socket
def decode(raw_bytes: bytes):
    code_ctx = av.CodecContext.create("h264", "r")
    packets = code_ctx.parse(raw_bytes)
    for i, packet in enumerate(packets):
        frames = code_ctx.decode(packet)
        if frames:
            return frame[0].to_ndarray()

您也可以尝试使用 pyav 直接读取带有 av.open("tcp://127.0.0.1:")

的 Stream

更新 2 你能测试一下吗，你在编辑时遇到的问题很奇怪，你不需要 websocket 层我想你可以直接从 raspivid

raspivid -a 12 -t 0 -w 1280 -h 720 -vf -ih -fps 30 -l -o tcp://0.0.0.0:5000

def get_first_frame(path):
    stream = av.open(path, 'r')
    for packet in stream.demux():
        frames = packet.decode()
        if frames:
            return frames[0].to_ndarray(format='bgr24')

ff = get_first_frame("tcp://0.0.0.0:5000")
cv2.imshow("Video", ff)
cv2.waitKey(0)

Answer 3

需要 PyAV 和 Pillow 包。不再需要使用 cv2。所以，添加包

pip3 install av
pip3 install Pillow

代码

import asyncio
import websockets
import av
import PIL

def decode_image(raw_bytes: bytes):
    code_ctx = av.CodecContext.create("h264", "r")
    packets = code_ctx.parse(raw_bytes)
    for i, packet in enumerate(packets):
        frames = code_ctx.decode(packet)
        if frames:
            return frames[0].to_image()

async def save_img_from_streaming():

    uri = "ws://127.0.0.1:8080"
    async with websockets.connect(uri) as websocket:
        image_init = await websocket.recv()

        count = 0
        combined = b''

        while count < 2:
            response = await websocket.recv()
            combined += response
            count += 1

        img = decode_image(combined)
        img.save("img1.png","PNG")

asyncio.get_event_loop().run_until_complete(save_img_from_streaming())

根据 Christoph 的回答，建议使用 to_ndarray，但我发现它以某种方式生成灰度图像，这是由 return 不正确的 numpy 数组形式（如 [[...], [...], [...], ...]）引起的。彩色图像应该是 [[[...], [...], [...], ...], ...] 这样的数组。然后，我看了一下PyAV docs，还有一个方法叫to_image，可以return一个RGBPIL.Image的frame。所以，只要使用那个功能就可以得到我需要的东西。

请注意，await websocket.recv() 的回复可能不同。这取决于服务器如何发送。

使用 websocket 从 h.264 视频流中捕获第一张图像 - Python

Capture first image from h.264 video streaming using websocket - Python

python

opencv

h.264

websocket

cv2