合并字典列表以删除所有重复项

Merging list of dictionaries to remove all duplicates

我正在尝试获取一个简单的 Python 代码来将字典列表合并到一个压缩列表中,因为我有很多重复的 atm。

来自这里:

[
    {
      "module": "RECEIPT BISCUITS",
      "product_range": "ULKER BISCUITS",
      "receipt_category": "BISCUITS"
    },
    {
      "module": "RECEIPT BISCUITS",
      "product_range": "ULKER",
      "receipt_category": "BISCUITS"
    },
    {
        "module": "RECEIPT BISCUITS",
        "product_range": "ULKER BISCUITS GOLD",
        "receipt_category": "BISCUITS GOLD"
    },
    {
        "module": "RECEIPT COFFEE",
        "product_range": "BLACK GOLD",
        "receipt_category": "BLACK GOLD"
    }
]

为此:

[
    {
      "module": "RECEIPT BISCUITS",
      "product_range": ["ULKER BISCUITS", "ULKER"],
      "receipt_category": ["BISCUITS", "BISCUITS GOLD"]
    },
    {
        "module": "RECEIPT COFFEE",
        "product_range": ["BLACK GOLD"],
        "receipt_category": ["BLACK GOLD"]
    }
]

其中模块用于在它们之间进行排序,即使只有一个值,其他 2 个也会存储为列表。这是 JSON 顺便说一下格式。

collections.defaultdict 拯救您的数据重组需求!

import collections

data = [
    {"module": "RECEIPT BISCUITS", "product_range": "ULKER BISCUITS", "receipt_category": "BISCUITS"},
    {"module": "RECEIPT BISCUITS", "product_range": "ULKER", "receipt_category": "BISCUITS"},
    {"module": "RECEIPT BISCUITS", "product_range": "ULKER BISCUITS GOLD", "receipt_category": "BISCUITS GOLD"},
    {"module": "RECEIPT COFFEE", "product_range": "BLACK GOLD", "receipt_category": "BLACK GOLD"},
]

grouped = collections.defaultdict(lambda: collections.defaultdict(list))
group_key = "module"

for datum in data:
    datum = datum.copy()  # Copy so we can .pop without consequence
    group = datum.pop(group_key)  # Get the key (`module` value)
    for key, value in datum.items():  # Loop over the rest and put them in the group
        grouped[group][key].append(value)

collated = [
    {
        group_key: group,
        **values,
    }
    for (group, values) in grouped.items()
]

print(collated)

打印出来

[
  {'module': 'RECEIPT BISCUITS', 'product_range': ['ULKER BISCUITS', 'ULKER', 'ULKER BISCUITS GOLD'], 'receipt_category': ['BISCUITS', 'BISCUITS', 'BISCUITS GOLD']},
  {'module': 'RECEIPT COFFEE', 'product_range': ['BLACK GOLD'], 'receipt_category': ['BLACK GOLD']}
]

请注意,这不会删除 product_range 中的重复值,因为我不确定值的顺序对您是否重要,以及是否使用集合(不保留顺序).

list 更改为 set 并将 append 更改为 add 将使值唯一。