Json 复杂 numpy 数组的编码器和解码器

Question

我正在尝试 JSON 编码一个复杂的 numpy 数组，我从 astropy (http://astropy.readthedocs.org/en/latest/_modules/astropy/utils/misc.html#JsonCustomEncoder) 中找到了一个用于此目的的实用程序：

import numpy as np

class JsonCustomEncoder(json.JSONEncoder):
    """ <cropped for brevity> """
    def default(self, obj):
        if isinstance(obj, (np.ndarray, np.number)):
            return obj.tolist()
        elif isinstance(obj, (complex, np.complex)):
            return [obj.real, obj.imag]
        elif isinstance(obj, set):
            return list(obj)
        elif isinstance(obj, bytes):  # pragma: py3
            return obj.decode()
        return json.JSONEncoder.default(self, obj)

这适用于复杂的 numpy 数组：

test = {'some_key':np.array([1+1j,2+5j, 3-4j])}

倾销产量：

encoded = json.dumps(test, cls=JsonCustomEncoder)
print encoded
>>> {"some key": [[1.0, 1.0], [2.0, 5.0], [3.0, -4.0]]}

问题是，我没有办法自动将其读回复杂数组。例如：

json.loads(encoded)
>>> {"some_key": [[1.0, 1.0], [2.0, 5.0], [3.0, -4.0]]}

你们能帮我想办法覆盖loads/decoding，让它推断这一定是一个复杂的数组吗？ IE。而不是 2 元素项目的列表，它应该只是将它们放回一个复杂的数组中。 JsonCustomDecoder 没有要覆盖的 default() 方法，关于编码的文档对我来说有太多行话。

Answer 1

不清楚您在 json encoding/decoding 或 numpy 方面需要多少帮助。例如，您最初是如何创建复杂数组的？

您的编码所做的是将数组呈现为列表的列表。解码器必须将其转换回适当数据类型的数组。例如：

d = json.loads(encoded)
a = np.dot(d['some_key'],np.array([1,1j]))
# array([ 1.+1.j,  2.+5.j,  3.-4.j])

这不是从这个列表创建这样一个数组的唯一方法，它可能会失败，但它是一个开始。

下一个任务是弄清楚何时使用这样的例程。如果你知道你将要接收这样一个数组，那么就做这个解码。

另一种选择是向字典中添加一个或多个键，将此变量标记为复杂的 nparray。一个键也可能对其形状进行编码（尽管这也可以从列表列表的嵌套中推断出来）。

这是否指向正确的方向？或者您在每一步都需要进一步的帮助吗？

这个'SimpleJSON and NumPy array'问题的答案之一

处理 numpy 数组的编码和解码。它使用 dtype 和 shape 以及数组的数据缓冲区对字典进行编码。所以 JSON 字符串对人类来说意义不大。但确实处理一般数组，包括具有复杂数据类型的数组。

expected 和 dump 打印为：

[ 1.+1.j  2.+5.j  3.-4.j]

{"dtype": "complex128", "shape": [3], 
    "__ndarray__": "AAAAAAAA8D8AAAAAAADwPwAAAAAAAABAAAAAAAAAFEAAAAAAAAAIQAAAAAAAABDA"}

自定义解码是使用 object_hook 函数完成的，该函数采用 dict 和 returns 数组（如果可能）。

json.loads(dumped, object_hook=json_numpy_obj_hook)

按照该模型，这是一个粗略的 hook，它将每个 JSON 数组转换为 np.array，并将每个具有 2 列的数组转换为一维复杂数组：

def numpy_hook(dct):
    jj = np.array([1,1j])
    for k,v in dct.items():
        if isinstance(v, list):
            v = np.array(v)
            if v.ndim==2 and v.shape[1]==2:
                v = np.dot(v,jj)
            dct[k] = v
    return dct

我认为，最好编码一些字典键来标记 numpy array，另一个标记 complex dtype。

我可以改进挂钩来处理常规列表和其他数组维度：

def numpy_hook(dct):
    jj = np.array([1,1j])
    for k,v in dct.items():
        if isinstance(v, list):
            # try to turn list into numpy array
            v = np.array(v)
            if v.dtype==object:
                # not a normal array, don't change it
                continue
            if v.ndim>1 and v.shape[-1]==2:
                # guess it is a complex array
                # this information should be more explicit
                v = np.dot(v,jj)
            dct[k] = v
    return dct

它处理这个结构：

A = np.array([1+1j,2+5j, 3-4j])
B = np.arange(12).reshape(3,4)
C = A+B.T
test = {'id': 'stream id',
        'arrays': [{'A': A}, {'B': B}, {'C': C}]}

返回：

{u'arrays': [{u'A': array([ 1.+1.j,  2.+5.j,  3.-4.j])}, 
       {u'B': array([[ 0,  1,  2,  3],
                     [ 4,  5,  6,  7],
                     [ 8,  9, 10, 11]])}, 
       {u'C': array([[  1.+1.j,   6.+5.j,  11.-4.j],
                     [  2.+1.j,   7.+5.j,  12.-4.j],
                     [  3.+1.j,   8.+5.j,  13.-4.j],
                     [  4.+1.j,   9.+5.j,  14.-4.j]])}], 
 u'id': u'stream id'}

我认为，任何更普遍的要求都需要修改编码以使数组标识明确。

Answer 2

这是我的最终解决方案，改编自 hpaulj 的回答，以及他对该主题的回答：

这将 encode/decode 嵌套到任意数据类型的嵌套字典中任意深度的数组。

import base64
import json
import numpy as np

class NumpyEncoder(json.JSONEncoder):
    def default(self, obj):
        """
        if input object is a ndarray it will be converted into a dict holding dtype, shape and the data base64 encoded
        """
        if isinstance(obj, np.ndarray):
            data_b64 = base64.b64encode(obj.data)
            return dict(__ndarray__=data_b64,
                        dtype=str(obj.dtype),
                        shape=obj.shape)
        # Let the base class default method raise the TypeError
        return json.JSONEncoder(self, obj)


def json_numpy_obj_hook(dct):
    """
    Decodes a previously encoded numpy ndarray
    with proper shape and dtype
    :param dct: (dict) json encoded ndarray
    :return: (ndarray) if input was an encoded ndarray
    """
    if isinstance(dct, dict) and '__ndarray__' in dct:
        data = base64.b64decode(dct['__ndarray__'])
        return np.frombuffer(data, dct['dtype']).reshape(dct['shape'])
    return dct

# Overload dump/load to default use this behavior.
def dumps(*args, **kwargs):
    kwargs.setdefault('cls', NumpyEncoder)
    return json.dumps(*args, **kwargs)

def loads(*args, **kwargs):
    kwargs.setdefault('object_hook', json_numpy_obj_hook)    
    return json.loads(*args, **kwargs)

def dump(*args, **kwargs):
    kwargs.setdefault('cls', NumpyEncoder)
    return json.dump(*args, **kwargs)

def load(*args, **kwargs):
    kwargs.setdefault('object_hook', json_numpy_obj_hook)
    return json.load(*args, **kwargs)

if __name__ == '__main__':

    data = np.arange(3, dtype=np.complex)

    one_level = {'level1': data, 'foo':'bar'}
    two_level = {'level2': one_level}

    dumped = dumps(two_level)
    result = loads(dumped)

    print '\noriginal data', data
    print '\nnested dict of dict complex array', two_level
    print '\ndecoded nested data', result

产生输出：

original data [ 0.+0.j  1.+0.j  2.+0.j]

nested dict of dict complex array {'level2': {'level1': array([ 0.+0.j,  1.+0.j,  2.+0.j]), 'foo': 'bar'}}

decoded nested data {u'level2': {u'level1': array([ 0.+0.j,  1.+0.j,  2.+0.j]), u'foo': u'bar'}}

Answer 3

很好，但有一个缺陷。它仅在您的数据为 C_CONTIGUOUS 时有效。如果你转置你的数据，那将不是真的。例如，测试以下内容：

A = np.arange(10).reshape(2,5)
A.flags
# C_CONTIGUOUS : True
# F_CONTIGUOUS : False
# OWNDATA : False
# WRITEABLE : True
# ALIGNED : True
# UPDATEIFCOPY : False
A = A.transpose()
#array([[0, 5],
#       [1, 6],
#       [2, 7],
#       [3, 8],
#       [4, 9]])
loads(dumps(A))
#array([[0, 1],
#       [2, 3],
#       [4, 5],
#       [6, 7],
#       [8, 9]])
A.flags
# C_CONTIGUOUS : False
# F_CONTIGUOUS : True
# OWNDATA : False
# WRITEABLE : True
# ALIGNED : True
# UPDATEIFCOPY : False

要解决此问题，请在将对象传递给 b64encode 时使用 'np.ascontiguousarray()'。具体来说，更改：

data_b64 = base64.b64encode(obj.data)

收件人：

data_b64 = base64.b64encode(np.ascontiguousarray(obj).data)

如果我对函数的理解正确，如果你的数据已经是 C_CONTIGUOUS，它不会采取任何行动，所以唯一的性能影响是当你有 F_CONTIGUOUS 数据时。

Answer 4

尝试traitschemahttps://traitschema.readthedocs.io/en/latest/

"Create serializable, type-checked schema using traits and Numpy. A typical use case involves saving several Numpy arrays of varying shape and type."

见to_json()

"This uses a custom JSON encoder to handle numpy arrays but could conceivably lose precision. If this is important, please consider serializing in HDF5 format instead"

Json 复杂 numpy 数组的编码器和解码器

Json Encoder AND Decoder for complex numpy arrays

python

encoding

serialization

json

numpy