根据属性名称以圆点表示法在 Python 中创建复杂对象

Question

我正在尝试根据我拥有的元数据创建一个复杂的对象。这是我正在迭代并尝试创建字典的属性数组。例如下面是数组：

[
    "itemUniqueId",
    "itemDescription",
    "manufacturerInfo[0].manufacturer.value",
    "manufacturerInfo[0].manufacturerPartNumber",
    "attributes.noun.value",
    "attributes.modifier.value",
    "attributes.entityAttributes[0].attributeName",
    "attributes.entityAttributes[0].attributeValue",
    "attributes.entityAttributes[0].attributeUOM",
    "attributes.entityAttributes[1].attributeName",
    "attributes.entityAttributes[1].attributeValue",
    "attributes.entityAttributes[1].attributeUOM",
]

这个数组应该给出如下输出：

{
    "itemUniqueId": "",
    "itemDescription": "",
    "manufacturerInfo": [
        {
            "manufacturer": {
                "value": ""
            },
            "manufacturerPartNumber": ""
        }
    ],
    "attributes": {
        "noun": {
            "value": ""
        },
        "modifier": {
            "value": ""
        },
        "entityAttributes": [
             {
                 "attributeName": "",
                 "attributeValue": "",
                 "attributeUOM": ""
             },
             {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
             }
        ]
    }
}

我已经写了这个逻辑但是无法得到想要的输出。给定元数据，它应该适用于对象和数组。

source_json = [
    "itemUniqueId",
    "itemDescription",
    "manufacturerInfo[0].manufacturer.value",
    "manufacturerInfo[0].manufacturerPartNumber",
    "attributes.noun.value",
    "attributes.modifier.value",
    "attributes.entityAttributes[0].attributeName",
    "attributes.entityAttributes[0].attributeValue",
    "attributes.entityAttributes[0].attributeUOM",
    "attributes.entityAttributes[1].attributeName",
    "attributes.entityAttributes[1].attributeValue",
    "attributes.entityAttributes[1].attributeUOM",
]

for row in source_json:
    propertyNames = row.split('.')
    temp = ''
    parent = {}
    parentArr = []
    parentObj = {}
    # if len(propertyNames) > 1:
    arrLength = len(propertyNames)
    for i, (current) in enumerate(zip(propertyNames)):
        if i == 0:
            if '[' in current:
                parent[current]=parentArr
            else:
                parent[current] = parentObj
            temp = current
        if i > 0 and i < arrLength - 1:
            if '[' in current:
                parent[current] = parentArr
            else:
                parent[current] = parentObj
            temp = current
        if i == arrLength - 1:
            if '[' in current:
                parent[current] = parentArr
            else:
                parent[current] = parentObj
            temp = current
            # temp[prev][current] = ""
    # finalMapping[target] = target
print(parent)

Answer 1

首先我们应该遍历整个列表并存储每个第 3 个属性，之后我们可以将这个结构更改为我们想要的输出：

from typing import Dict, List


source_json = [
    "attributes.entityAttributes[0].attributeName",
    "attributes.entityAttributes[0].attributeValue",
    "attributes.entityAttributes[0].attributeUOM",
    "attributes.entityAttributes[1].attributeName",
    "attributes.entityAttributes[1].attributeValue",
    "attributes.entityAttributes[1].attributeUOM",
    "attributes.entityAttributes[2].attributeName"
]


def accumulate(source: List) -> Dict:
    accumulator = {}
    for v in source:
        vs = v.split(".")
        root_attribute = vs[0]
        if not root_attribute in accumulator:
            accumulator[root_attribute] = {}

        i = vs[1].rfind('[')
        k = (vs[1][:i], vs[1][i+1:-1])

        if not k in accumulator[root_attribute]:
            accumulator[root_attribute][k] = {}
        accumulator[root_attribute][k][vs[2]] = ""
    return accumulator


def get_result(accumulated: Dict) -> Dict:
    result = {}
    for k, v in accumulated.items():
        result[k] = {}
        for (entity, idx), v1 in v.items():
            if not entity in result[k]:
                result[k][entity] = []
            if len(v1) == 3:
                result[k][entity].append(v1)
    return result


print(get_result(accumulate(source_json)))

输出将是：


{
    'attributes':
    {
        'entityAttributes':
        [
            {
                'attributeName': '',
                'attributeValue': '',
                'attributeUOM': ''
            },
            {'attributeName': '',
             'attributeValue': '',
             'attributeUOM': ''
            }
        ]
    }
}

在 accumulate 函数中，我们使用 (entityAttributes, 0) ... (entityAttributes, 2) 键将第 3 级属性存储在 Dict 中。在 get_result 函数中，我们将带有 (entityAttributes, 0) ... (entityAttributes, 2) 键的 Dict 转换为 Dict 从字符串到 List.

Answer 2

这样的事情怎么样：

import re
import json

source_json = [
"attributes.entityAttributes[0].attributeName",
"attributes.entityAttributes[0].attributeValue",
"attributes.entityAttributes[0].attributeUOM",
"attributes.entityAttributes[1].attributeName",
"attributes.entityAttributes[1].attributeValue",
"attributes.entityAttributes[1].attributeUOM",
"attributes.entityAttributes[2].attributeName"
]


def to_object(source_json):

    def add_attribute(target, attribute_list):
        head, tail = attribute_list[0], attribute_list[1:]
        if tail:
            add_attribute(target.setdefault(head,{}), tail)
        else:
            target[head] = ''
    
    target = {}
    for row in source_json:
        add_attribute(target, re.split(r'[\.\[\]]+',row))
    return target
    
  
print(json.dumps(to_object(source_json), indent=4))

请注意，这不会完全按照您的要求进行。它将数组也解释为具有键 '0' ... '2' 的对象。这使得它更容易实现，也更稳定。当输入列表缺少带有 entityAttributes[0] 的条目时，您会期待什么？列表是否应该包含一个空元素或不同的东西。无论如何，您通过不包括此元素来保存 space，只有当您将数组存储在对象中时才有效。

Answer 3

在上有一个类似的问题，接受的答案适用于这个问题，但有未使用的代码路径（例如 isInArray）并且迎合了该问题预期的非常规转换：

❓ "arrOne[0]": "1,2,3" → "arrOne": ["1", "2", "3"] 而不是
✅ "arrOne[0]": "1,2,3" → "arrOne": ["1,2,3"] 或
✅ "arrOne[0]": "1", "arrOne[1]": "2", "arrOne[2]": "3" → "arrOne": ["1", "2", "3"]

下面是 branch 函数的改进实现：

def branch(tree, path, value):
    key = path[0]
    array_index_match = re.search(r'\[([0-9]+)\]', key)

    if array_index_match:
        # Get the array index, and remove the match from the key
        array_index = int(array_index_match[0].replace('[', '').replace(']', ''))
        key = key.replace(array_index_match[0], '')

        # Prepare the array at the key
        if key not in tree:
            tree[key] = []

        # Prepare the object at the array index
        if array_index == len(tree[key]):
            tree[key].append({})

        # Replace the object at the array index
        tree[key][array_index] = value if len(path) == 1 else branch(tree[key][array_index], path[1:], value)

    else:
        # Prepare the object at the key
        if key not in tree:
            tree[key] = {}

        # Replace the object at the key
        tree[key] = value if len(path) == 1 else branch(tree[key], path[1:], value)

    return tree

用法：

VALUE = ''

def create_dict(attributes):
    d = {}
    for path_str in attributes:
        branch(d, path_str.split('.'), VALUE)
    return d

source_json = [
    "itemUniqueId",
    "itemDescription",
    "manufacturerInfo[0].manufacturer.value",
    "manufacturerInfo[0].manufacturerPartNumber",
    "attributes.noun.value",
    "attributes.modifier.value",
    "attributes.entityAttributes[0].attributeName",
    "attributes.entityAttributes[0].attributeValue",
    "attributes.entityAttributes[0].attributeUOM",
    "attributes.entityAttributes[1].attributeName",
    "attributes.entityAttributes[1].attributeValue",
    "attributes.entityAttributes[1].attributeUOM",
]

assert create_dict(source_json) == {
    "itemUniqueId": "",
    "itemDescription": "",
    "manufacturerInfo": [
        {
            "manufacturer": {
                "value": ""
            },
            "manufacturerPartNumber": ""
        }
    ],
    "attributes": {
        "noun": {
            "value": ""
        },
        "modifier": {
            "value": ""
        },
        "entityAttributes": [
             {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
            },
           {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
            }
        ]
    }
}

Answer 4

您可以使用自定义构建器 class，它在每个属性字符串上实现 __getattr__ and __getitem__ to gradually build the underlying object. This building can then be triggered by using eval（注意： eval 是 对于来自不受信任来源的输入不安全。

以下是一个示例实现：

class Builder:
    def __init__(self):
        self.obj = None

    def __getattr__(self, key):
        if self.obj is None:
            self.obj = {}
        return self.obj.setdefault(key, Builder())

    def __getitem__(self, index):
        if self.obj is None:
            self.obj = []
        self.obj.extend(Builder() for _ in range(index+1-len(self.obj)))
        return self.obj[index]

    def convert(self):
        if self.obj is None:
            return ''
        elif isinstance(self.obj, list):
            return [v.convert() for v in self.obj]
        elif isinstance(self.obj, dict):
            return {k: v.convert() for k,v in self.obj.items()}
        else:
            assert False


attributes = [
    'itemUniqueId',
    'itemDescription',
    'manufacturerInfo[0].manufacturer.value',
    'manufacturerInfo[0].manufacturerPartNumber',
    'attributes.noun.value',
    'attributes.modifier.value',
    'attributes.entityAttributes[0].attributeName',
    'attributes.entityAttributes[0].attributeValue',
    'attributes.entityAttributes[0].attributeUOM',
    'attributes.entityAttributes[1].attributeName',
    'attributes.entityAttributes[1].attributeValue',
    'attributes.entityAttributes[1].attributeUOM',
]

builder = Builder()
for attr in attributes:
    eval(f'builder.{attr}')
result = builder.convert()

import json
print(json.dumps(result, indent=4))

给出以下输出：

{
    "itemUniqueId": "",
    "itemDescription": "",
    "manufacturerInfo": [
        {
            "manufacturer": {
                "value": ""
            },
            "manufacturerPartNumber": ""
        }
    ],
    "attributes": {
        "noun": {
            "value": ""
        },
        "modifier": {
            "value": ""
        },
        "entityAttributes": [
            {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
            },
            {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
            }
        ]
    }
}

Answer 5

到目前为止提供的

None 个答案让我觉得非常直观。这是一种方法用三个 easy-to-understand 函数解决问题。

标准化输入。首先，我们需要一个函数来规范化输入字符串。而不是像 rules-bearing 这样的字符串 'foo[0].bar' – 必须理解整数方括号中表示一个列表——我们想要一个简单的元组像 ('foo', 0, 'bar').

这样的键

def attribute_to_keys(a):
    return tuple(
        int(k) if k.isdigit() else k
        for k in a.replace('[', '.').replace(']', '').split('.')
    )

构建统一的数据结构。其次，我们需要一个函数来assemble一个由dict组成的数据结构的dicts of dicts ...一直往下。

def assemble_data(attributes):
    data = {}
    for a in attributes:
        d = data
        for k in attribute_to_keys(a):
            d = d.setdefault(k, {})
    return convert(data)

def convert(d):
    # Just a placeholder for now.
    return d

转换统一数据。第三，我们需要实现一个真实版本的占位符。具体来说，我们需要它递归地将统一的数据结构转换成我们最终的目标是 (a) 在叶节点处有空字符串，和 (b) 列表而不是字典每当字典键都是整数时。请注意，这甚至会填空列出带有空字符串的位置（您的问题中未涵盖的意外事件描述;如果您想要不同的行为，请根据需要进行调整。

def convert(d):
    if not d:
        return ''
    elif all(isinstance(k, int) for k in d):
        return [convert(d.get(i)) for i in range(max(d) + 1)]
    else:
        return {k : convert(v) for k, v in d.items()}

根据属性名称以圆点表示法在 Python 中创建复杂对象

Create complex object in Python based on property names in dot notation

python

arrays

dictionary

list

deserialization

根据 属性 名称以圆点表示法在 Python 中创建复杂对象

Create complex object in Python based on property names in dot notation

python

arrays

dictionary

list

deserialization

根据属性名称以圆点表示法在 Python 中创建复杂对象