将二维列表转换为嵌套字典的最快方法

Fastest way to convert two-dimensional list to a nested dictionary

我在 Whosebug 上浏览了很多答案,但其中 none 非常符合我的用例。我已经将使用元组的最接近的答案转换为使用字典的代码并让它工作:

def get_matching_parent_entry_recursive(entries_to_traverse, parent_id):
    for entry in entries_to_traverse:
        if entry["id"] == parent_id:
            return entry
        if len(entry["items"]) > 0:
            matching_entry = get_matching_parent_entry_recursive(entry["items"], parent_id)
            if matching_entry:
                return matching_entry
    return None


def get_content_tree(categories):
    simplified_entries = []
    unmatched_entries = categories

    while len(unmatched_entries) > 0:

        i = 0
        while i < len(unmatched_entries):
            if unmatched_entries[i]["parent"] == 0:  # we are dealing with a top-level entry
                simplified_entries.append(
                    {
                        "id": str(unmatched_entries[i]["id"]),
                        "label": unmatched_entries[i]["label"],
                        "items": []
                    }
                )
                del unmatched_entries[i]
            else:
                # try to find parent entry in the final simplified tree
                parent_entry = get_matching_parent_entry_recursive(
                    simplified_entries,
                    str(unmatched_entries[i]["parent"])
                )
                if parent_entry:
                    parent_entry["items"].append({
                        "id": str(unmatched_entries[i]["id"]),
                        "label": unmatched_entries[i]["label"],
                        "items": []
                    })
                    del unmatched_entries[i]
                else:
                    i += 1
    return simplified_entries


categories = [
    {"id": 1, "label": 'Lamps', "parent": 0},
    {"id": 2, "label": 'Lamp 1', "parent": 1},
    {"id": 3, "label": 'Lamp 2', "parent": 1},
    {"id": 4, "label": 'Shirts', "parent": 0},
    {"id": 5, "label": 'Formal shirts', "parent": 4},
    {"id": 7, "label": 'T-Shirts', "parent": 9},
    {"id": 9, "label": 'Informal shirts', "parent": 4},
]

print(get_content_tree(categories))

但是,我可以看到该算法必须遍历每个节点才能找到匹配的父节点,这对性能不利。有没有更快的方法来获得相同的结果?

编辑: Replit link

编辑 2: ID 是唯一的和可散列的,但不是为了确保父项在子项引用它之前创建。 IE。所有衬衫都可以移至新创建的类别,该类别的 ID 为 9,但 T 恤的 ID 仍为 7。

假设 ID 是唯一的(并且可哈希),您可以将对每个字典的引用存储在单独的键下:

categories = [
    {"id": 1, "label": "Lamps", "parent": 0},
    {"id": 2, "label": "Lamp 1", "parent": 1},
    {"id": 3, "label": "Lamp 2", "parent": 1},
    {"id": 4, "label": "Shirts", "parent": 0},
    {"id": 5, "label": "Formal shirts", "parent": 4},
    {"id": 6, "label": "Informal shirts", "parent": 4},
    {"id": 7, "label": "T-Shirts", "parent": 6},
]

root = {0: {"items": []}}
for c in categories:
    raw = {"id": c["id"], "items": [], "label": c["label"]}
    root[c["id"]] = raw
    root[c["parent"]]["items"].append(raw)

print(root[0]["items"])

编辑:

如果类别不是 topological sorted,它将不起作用。在这种情况下,我们可以执行那种排序(最好的解决方案......理论上)或做一些小的改进,在实际情况下应该更快更优雅:

from collections import deque

categories = deque(
    [
        {"id": 1, "label": "Lamps", "parent": 0},
        {"id": 2, "label": "Lamp 1", "parent": 1},
        {"id": 3, "label": "Lamp 2", "parent": 1},
        {"id": 4, "label": "Shirts", "parent": 0},
        {"id": 5, "label": "Formal shirts", "parent": 4},
        {"id": 7, "label": "T-Shirts", "parent": 9},
        {"id": 9, "label": "Informal shirts", "parent": 4},
    ]
)

root = {0: {"items": []}}
while categories:
    c = categories.popleft()
    raw = {"id": c["id"], "items": [], "label": c["label"]}
    try:
        # order is important here, as we don't want to create reference in root dictionary
        root[c["parent"]]["items"].append(raw)         
        root[c["id"]] = raw
    except KeyError:
        # We didn't process parent category yet, lets back to that later
        categories.append(c)

print(root[0]["items"])