优化 CBOR 读取函数以将数据传递到 numpy

Question

我正在尝试将 CBOR 文件中的图像数据读入 Numpy 数组。

理想情况下，我正在寻找一种更有效的读取方式，将字节从二进制补码转换为无符号字节，然后将图像数据读入 numpy 数组。

我尝试了几种不同的方法来转换和读取字节，但无法显着提高速度。

最初我使用 for 循环来转换字节（下图 1），然后我使用 numpy 和模（下图 2），然后转向选择性加法（下图 3）。

我的全部功能也在下面。

1) for x in data:
    new_byte = x%256
2) ndarray%256
3) image[image<0] += 256

import os
from cbor2 import dumps, loads, decoder
import numpy as np
import itertools

def decode_image_bytes(image_byte_array):
    """Input: 1-D list of 16 bit two's compliment bytes 
        Operations: Converts the bytes to unsigned and decodes them
        Output: a 1-D array of 16-bit image data"""
    # Convert input to numpy array
    image = np.array(image_byte_array)
    # Convert two's complement bytes to unsigned
    image[image<0] += 256
    # Split the unsigned bytes into segments
    bytes_array=np.array_split(image,(len(image)/2))
    holder = list()
    # Convert segements into integer values
    for x in bytes_array:
        holder.append(int.from_bytes(list(x), byteorder='big', signed=False))
    return holder

def decode_image_metadata(image_dimensions_bytes_array):
    """Input: 1-D list of sint64 two's complement bytes
        Operations: Converts bytes to unsigned and decodes them
        Output: Dictionary with possible values: 'width, height, channels, Z, time'"""
    # Convert input to numpy array
    dimensions = np.array(image_dimensions_bytes_array)
    # Covert two's complement bytes to unsigned
    dimensions[dimensions<0] += 256
    # Split the unsigned bytes into segements
    bytes_array=np.array_split(dimensions,(len(dimensions)/8))
    # Convert the segments into integer values
    for x in range(0, len(bytes_array)):
        bytes_array[x]=int.from_bytes(list(bytes_array[x]), byteorder='big', signed=True)
    # Put the converted integer values into a dictionary
    end = dict(itertools.zip_longest(['width', 'height', 'channels', 'Z', 'time'], bytes_array, fillvalue=None))
    return end

现在转换字节和 return Numpy 数组需要 20-30 秒。如果可能的话，我想把它减半。

现在我想到了用 using 来消除 for 循环。有没有更好的方法？

bytes_array = np.apply_along_axis(metadata_values, 1, bytes_array)

def metadata_values(element):
    return int.from_bytes(element, byteorder='big', signed=True)

Answer 1

除非你是为了自己的教育，否则你不应该自己编写二进制数字表示之间的转换，因为它会慢几个数量级。

下面是一个将字节读入各种格式的 numpy 数组的例子：

>>> b = bytes([0,1,127,128,255,254]) #equivelant to reading bytes from a file in binary mode
>>> np.frombuffer(b, dtype=np.uint8)
array([  0,   1, 127, 128, 255, 254], dtype=uint8) #notice the *U*int vs int
>>> np.frombuffer(b, dtype=np.int8)
array([   0,    1,  127, -128,   -1,   -2], dtype=int8)
>>> #you can also specify other than 1 byte data formats as long as you have the right amount of bytes
>>> np.frombuffer(b, dtype=np.int16)
array([   256, -32641,   -257], dtype=int16)
>>> np.frombuffer(b, dtype=np.uint16)
array([  256, 32895, 65279], dtype=uint16)

优化 CBOR 读取函数以将数据传递到 numpy

Optimizing CBOR reading functions to pass data into numpy

python

optimization

numpy

python-3.x

cbor