根据 属性 名称以圆点表示法在 Python 中创建复杂对象

Create complex object in Python based on property names in dot notation

我正在尝试根据我拥有的元数据创建一个复杂的对象。这是我正在迭代并尝试创建字典的属性数组。例如下面是数组:

[
    "itemUniqueId",
    "itemDescription",
    "manufacturerInfo[0].manufacturer.value",
    "manufacturerInfo[0].manufacturerPartNumber",
    "attributes.noun.value",
    "attributes.modifier.value",
    "attributes.entityAttributes[0].attributeName",
    "attributes.entityAttributes[0].attributeValue",
    "attributes.entityAttributes[0].attributeUOM",
    "attributes.entityAttributes[1].attributeName",
    "attributes.entityAttributes[1].attributeValue",
    "attributes.entityAttributes[1].attributeUOM",
]

这个数组应该给出如下输出:

{
    "itemUniqueId": "",
    "itemDescription": "",
    "manufacturerInfo": [
        {
            "manufacturer": {
                "value": ""
            },
            "manufacturerPartNumber": ""
        }
    ],
    "attributes": {
        "noun": {
            "value": ""
        },
        "modifier": {
            "value": ""
        },
        "entityAttributes": [
             {
                 "attributeName": "",
                 "attributeValue": "",
                 "attributeUOM": ""
             },
             {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
             }
        ]
    }
}

我已经写了这个逻辑但是无法得到想要的输出。给定元数据,它应该适用于对象和数组。

source_json = [
    "itemUniqueId",
    "itemDescription",
    "manufacturerInfo[0].manufacturer.value",
    "manufacturerInfo[0].manufacturerPartNumber",
    "attributes.noun.value",
    "attributes.modifier.value",
    "attributes.entityAttributes[0].attributeName",
    "attributes.entityAttributes[0].attributeValue",
    "attributes.entityAttributes[0].attributeUOM",
    "attributes.entityAttributes[1].attributeName",
    "attributes.entityAttributes[1].attributeValue",
    "attributes.entityAttributes[1].attributeUOM",
]

for row in source_json:
    propertyNames = row.split('.')
    temp = ''
    parent = {}
    parentArr = []
    parentObj = {}
    # if len(propertyNames) > 1:
    arrLength = len(propertyNames)
    for i, (current) in enumerate(zip(propertyNames)):
        if i == 0:
            if '[' in current:
                parent[current]=parentArr
            else:
                parent[current] = parentObj
            temp = current
        if i > 0 and i < arrLength - 1:
            if '[' in current:
                parent[current] = parentArr
            else:
                parent[current] = parentObj
            temp = current
        if i == arrLength - 1:
            if '[' in current:
                parent[current] = parentArr
            else:
                parent[current] = parentObj
            temp = current
            # temp[prev][current] = ""
    # finalMapping[target] = target
print(parent)

首先我们应该遍历整个列表并存储每个第 3 个属性,之后我们可以将这个结构更改为我们想要的输出:

from typing import Dict, List


source_json = [
    "attributes.entityAttributes[0].attributeName",
    "attributes.entityAttributes[0].attributeValue",
    "attributes.entityAttributes[0].attributeUOM",
    "attributes.entityAttributes[1].attributeName",
    "attributes.entityAttributes[1].attributeValue",
    "attributes.entityAttributes[1].attributeUOM",
    "attributes.entityAttributes[2].attributeName"
]


def accumulate(source: List) -> Dict:
    accumulator = {}
    for v in source:
        vs = v.split(".")
        root_attribute = vs[0]
        if not root_attribute in accumulator:
            accumulator[root_attribute] = {}

        i = vs[1].rfind('[')
        k = (vs[1][:i], vs[1][i+1:-1])

        if not k in accumulator[root_attribute]:
            accumulator[root_attribute][k] = {}
        accumulator[root_attribute][k][vs[2]] = ""
    return accumulator


def get_result(accumulated: Dict) -> Dict:
    result = {}
    for k, v in accumulated.items():
        result[k] = {}
        for (entity, idx), v1 in v.items():
            if not entity in result[k]:
                result[k][entity] = []
            if len(v1) == 3:
                result[k][entity].append(v1)
    return result


print(get_result(accumulate(source_json)))

输出将是:


{
    'attributes':
    {
        'entityAttributes':
        [
            {
                'attributeName': '',
                'attributeValue': '',
                'attributeUOM': ''
            },
            {'attributeName': '',
             'attributeValue': '',
             'attributeUOM': ''
            }
        ]
    }
}

在 accumulate 函数中,我们使用 (entityAttributes, 0) ... (entityAttributes, 2) 键将第 3 级属性存储在 Dict 中。 在 get_result 函数中,我们将带有 (entityAttributes, 0) ... (entityAttributes, 2) 键的 Dict 转换为 Dict 从字符串到 List.

这样的事情怎么样:

import re
import json

source_json = [
"attributes.entityAttributes[0].attributeName",
"attributes.entityAttributes[0].attributeValue",
"attributes.entityAttributes[0].attributeUOM",
"attributes.entityAttributes[1].attributeName",
"attributes.entityAttributes[1].attributeValue",
"attributes.entityAttributes[1].attributeUOM",
"attributes.entityAttributes[2].attributeName"
]


def to_object(source_json):

    def add_attribute(target, attribute_list):
        head, tail = attribute_list[0], attribute_list[1:]
        if tail:
            add_attribute(target.setdefault(head,{}), tail)
        else:
            target[head] = ''
    
    target = {}
    for row in source_json:
        add_attribute(target, re.split(r'[\.\[\]]+',row))
    return target
    
  
print(json.dumps(to_object(source_json), indent=4))

请注意,这不会完全按照您的要求进行。它将数组也解释为具有键 '0' ... '2' 的对象。这使得它更容易实现,也更稳定。当输入列表缺少带有 entityAttributes[0] 的条目时,您会期待什么?列表是否应该包含一个空元素或不同的东西。无论如何,您通过不包括此元素来保存 space,只有当您将数组存储在对象中时才有效。

上有一个类似的问题,接受的答案适用于这个问题,但有未使用的代码路径(例如 isInArray)并且迎合了该问题预期的非常规转换:

  • "arrOne[0]": "1,2,3""arrOne": ["1", "2", "3"] 而不是
  • "arrOne[0]": "1,2,3""arrOne": ["1,2,3"]
  • "arrOne[0]": "1", "arrOne[1]": "2", "arrOne[2]": "3""arrOne": ["1", "2", "3"]

下面是 branch 函数的改进实现:

def branch(tree, path, value):
    key = path[0]
    array_index_match = re.search(r'\[([0-9]+)\]', key)

    if array_index_match:
        # Get the array index, and remove the match from the key
        array_index = int(array_index_match[0].replace('[', '').replace(']', ''))
        key = key.replace(array_index_match[0], '')

        # Prepare the array at the key
        if key not in tree:
            tree[key] = []

        # Prepare the object at the array index
        if array_index == len(tree[key]):
            tree[key].append({})

        # Replace the object at the array index
        tree[key][array_index] = value if len(path) == 1 else branch(tree[key][array_index], path[1:], value)

    else:
        # Prepare the object at the key
        if key not in tree:
            tree[key] = {}

        # Replace the object at the key
        tree[key] = value if len(path) == 1 else branch(tree[key], path[1:], value)

    return tree

用法:

VALUE = ''

def create_dict(attributes):
    d = {}
    for path_str in attributes:
        branch(d, path_str.split('.'), VALUE)
    return d
source_json = [
    "itemUniqueId",
    "itemDescription",
    "manufacturerInfo[0].manufacturer.value",
    "manufacturerInfo[0].manufacturerPartNumber",
    "attributes.noun.value",
    "attributes.modifier.value",
    "attributes.entityAttributes[0].attributeName",
    "attributes.entityAttributes[0].attributeValue",
    "attributes.entityAttributes[0].attributeUOM",
    "attributes.entityAttributes[1].attributeName",
    "attributes.entityAttributes[1].attributeValue",
    "attributes.entityAttributes[1].attributeUOM",
]

assert create_dict(source_json) == {
    "itemUniqueId": "",
    "itemDescription": "",
    "manufacturerInfo": [
        {
            "manufacturer": {
                "value": ""
            },
            "manufacturerPartNumber": ""
        }
    ],
    "attributes": {
        "noun": {
            "value": ""
        },
        "modifier": {
            "value": ""
        },
        "entityAttributes": [
             {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
            },
           {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
            }
        ]
    }
}

您可以使用自定义构建器 class,它在每个属性字符串上实现 __getattr__ and __getitem__ to gradually build the underlying object. This building can then be triggered by using eval注意: eval对于来自不受信任来源的输入不安全

以下是一个示例实现:

class Builder:
    def __init__(self):
        self.obj = None

    def __getattr__(self, key):
        if self.obj is None:
            self.obj = {}
        return self.obj.setdefault(key, Builder())

    def __getitem__(self, index):
        if self.obj is None:
            self.obj = []
        self.obj.extend(Builder() for _ in range(index+1-len(self.obj)))
        return self.obj[index]

    def convert(self):
        if self.obj is None:
            return ''
        elif isinstance(self.obj, list):
            return [v.convert() for v in self.obj]
        elif isinstance(self.obj, dict):
            return {k: v.convert() for k,v in self.obj.items()}
        else:
            assert False


attributes = [
    'itemUniqueId',
    'itemDescription',
    'manufacturerInfo[0].manufacturer.value',
    'manufacturerInfo[0].manufacturerPartNumber',
    'attributes.noun.value',
    'attributes.modifier.value',
    'attributes.entityAttributes[0].attributeName',
    'attributes.entityAttributes[0].attributeValue',
    'attributes.entityAttributes[0].attributeUOM',
    'attributes.entityAttributes[1].attributeName',
    'attributes.entityAttributes[1].attributeValue',
    'attributes.entityAttributes[1].attributeUOM',
]

builder = Builder()
for attr in attributes:
    eval(f'builder.{attr}')
result = builder.convert()

import json
print(json.dumps(result, indent=4))

给出以下输出:

{
    "itemUniqueId": "",
    "itemDescription": "",
    "manufacturerInfo": [
        {
            "manufacturer": {
                "value": ""
            },
            "manufacturerPartNumber": ""
        }
    ],
    "attributes": {
        "noun": {
            "value": ""
        },
        "modifier": {
            "value": ""
        },
        "entityAttributes": [
            {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
            },
            {
                "attributeName": "",
                "attributeValue": "",
                "attributeUOM": ""
            }
        ]
    }
}
到目前为止提供的

None 个答案让我觉得非常直观。这是一种方法 用三个 easy-to-understand 函数解决问题。

标准化输入。首先,我们需要一个函数来规范化输入字符串。而不是像 rules-bearing 这样的字符串 'foo[0].bar' – 必须理解整数 方括号中表示一个列表——我们想要一个简单的元组 像 ('foo', 0, 'bar').

这样的键
def attribute_to_keys(a):
    return tuple(
        int(k) if k.isdigit() else k
        for k in a.replace('[', '.').replace(']', '').split('.')
    )

构建统一的数据结构。其次,我们需要一个函数来assemble一个由dict组成的数据结构 的dicts of dicts ...一直往下。

def assemble_data(attributes):
    data = {}
    for a in attributes:
        d = data
        for k in attribute_to_keys(a):
            d = d.setdefault(k, {})
    return convert(data)

def convert(d):
    # Just a placeholder for now.
    return d

转换统一数据。第三,我们需要实现一个真实版本的占位符。具体来说,我们 需要它递归地将统一的数据结构转换成我们最终的 目标是 (a) 在叶节点处有空字符串,和 (b) 列表而不是字典 每当字典键都是整数时。请注意,这甚至会填空 列出带有空字符串的位置(您的问题中未涵盖的意外事件 描述;如果您想要不同的行为,请根据需要进行调整。

def convert(d):
    if not d:
        return ''
    elif all(isinstance(k, int) for k in d):
        return [convert(d.get(i)) for i in range(max(d) + 1)]
    else:
        return {k : convert(v) for k, v in d.items()}