合并字典列表以删除所有重复项
Merging list of dictionaries to remove all duplicates
我正在尝试获取一个简单的 Python 代码来将字典列表合并到一个压缩列表中,因为我有很多重复的 atm。
来自这里:
[
{
"module": "RECEIPT BISCUITS",
"product_range": "ULKER BISCUITS",
"receipt_category": "BISCUITS"
},
{
"module": "RECEIPT BISCUITS",
"product_range": "ULKER",
"receipt_category": "BISCUITS"
},
{
"module": "RECEIPT BISCUITS",
"product_range": "ULKER BISCUITS GOLD",
"receipt_category": "BISCUITS GOLD"
},
{
"module": "RECEIPT COFFEE",
"product_range": "BLACK GOLD",
"receipt_category": "BLACK GOLD"
}
]
为此:
[
{
"module": "RECEIPT BISCUITS",
"product_range": ["ULKER BISCUITS", "ULKER"],
"receipt_category": ["BISCUITS", "BISCUITS GOLD"]
},
{
"module": "RECEIPT COFFEE",
"product_range": ["BLACK GOLD"],
"receipt_category": ["BLACK GOLD"]
}
]
其中模块用于在它们之间进行排序,即使只有一个值,其他 2 个也会存储为列表。这是 JSON 顺便说一下格式。
collections.defaultdict
拯救您的数据重组需求!
import collections
data = [
{"module": "RECEIPT BISCUITS", "product_range": "ULKER BISCUITS", "receipt_category": "BISCUITS"},
{"module": "RECEIPT BISCUITS", "product_range": "ULKER", "receipt_category": "BISCUITS"},
{"module": "RECEIPT BISCUITS", "product_range": "ULKER BISCUITS GOLD", "receipt_category": "BISCUITS GOLD"},
{"module": "RECEIPT COFFEE", "product_range": "BLACK GOLD", "receipt_category": "BLACK GOLD"},
]
grouped = collections.defaultdict(lambda: collections.defaultdict(list))
group_key = "module"
for datum in data:
datum = datum.copy() # Copy so we can .pop without consequence
group = datum.pop(group_key) # Get the key (`module` value)
for key, value in datum.items(): # Loop over the rest and put them in the group
grouped[group][key].append(value)
collated = [
{
group_key: group,
**values,
}
for (group, values) in grouped.items()
]
print(collated)
打印出来
[
{'module': 'RECEIPT BISCUITS', 'product_range': ['ULKER BISCUITS', 'ULKER', 'ULKER BISCUITS GOLD'], 'receipt_category': ['BISCUITS', 'BISCUITS', 'BISCUITS GOLD']},
{'module': 'RECEIPT COFFEE', 'product_range': ['BLACK GOLD'], 'receipt_category': ['BLACK GOLD']}
]
请注意,这不会删除 product_range
中的重复值,因为我不确定值的顺序对您是否重要,以及是否使用集合(不保留顺序).
将 list
更改为 set
并将 append
更改为 add
将使值唯一。
我正在尝试获取一个简单的 Python 代码来将字典列表合并到一个压缩列表中,因为我有很多重复的 atm。
来自这里:
[
{
"module": "RECEIPT BISCUITS",
"product_range": "ULKER BISCUITS",
"receipt_category": "BISCUITS"
},
{
"module": "RECEIPT BISCUITS",
"product_range": "ULKER",
"receipt_category": "BISCUITS"
},
{
"module": "RECEIPT BISCUITS",
"product_range": "ULKER BISCUITS GOLD",
"receipt_category": "BISCUITS GOLD"
},
{
"module": "RECEIPT COFFEE",
"product_range": "BLACK GOLD",
"receipt_category": "BLACK GOLD"
}
]
为此:
[
{
"module": "RECEIPT BISCUITS",
"product_range": ["ULKER BISCUITS", "ULKER"],
"receipt_category": ["BISCUITS", "BISCUITS GOLD"]
},
{
"module": "RECEIPT COFFEE",
"product_range": ["BLACK GOLD"],
"receipt_category": ["BLACK GOLD"]
}
]
其中模块用于在它们之间进行排序,即使只有一个值,其他 2 个也会存储为列表。这是 JSON 顺便说一下格式。
collections.defaultdict
拯救您的数据重组需求!
import collections
data = [
{"module": "RECEIPT BISCUITS", "product_range": "ULKER BISCUITS", "receipt_category": "BISCUITS"},
{"module": "RECEIPT BISCUITS", "product_range": "ULKER", "receipt_category": "BISCUITS"},
{"module": "RECEIPT BISCUITS", "product_range": "ULKER BISCUITS GOLD", "receipt_category": "BISCUITS GOLD"},
{"module": "RECEIPT COFFEE", "product_range": "BLACK GOLD", "receipt_category": "BLACK GOLD"},
]
grouped = collections.defaultdict(lambda: collections.defaultdict(list))
group_key = "module"
for datum in data:
datum = datum.copy() # Copy so we can .pop without consequence
group = datum.pop(group_key) # Get the key (`module` value)
for key, value in datum.items(): # Loop over the rest and put them in the group
grouped[group][key].append(value)
collated = [
{
group_key: group,
**values,
}
for (group, values) in grouped.items()
]
print(collated)
打印出来
[
{'module': 'RECEIPT BISCUITS', 'product_range': ['ULKER BISCUITS', 'ULKER', 'ULKER BISCUITS GOLD'], 'receipt_category': ['BISCUITS', 'BISCUITS', 'BISCUITS GOLD']},
{'module': 'RECEIPT COFFEE', 'product_range': ['BLACK GOLD'], 'receipt_category': ['BLACK GOLD']}
]
请注意,这不会删除 product_range
中的重复值,因为我不确定值的顺序对您是否重要,以及是否使用集合(不保留顺序).
将 list
更改为 set
并将 append
更改为 add
将使值唯一。