识别并存储在字典中多次出现的值 (Python)
Identify & store the values that appear multiple times in a dictionary (Python)
我有一个字典列表,其中重复了一些“术语”值:
terms_dict = [{'term': 'potato', 'cui': '123AB'}, {'term': 'carrot', 'cui': '222AB'}, {'term': 'potato', 'cui': '456AB'}]
如您所见,术语 'potato' 值出现了不止一次。我想将此 'term' 作为变量存储以备将来参考。然后,从 terms_dict
中删除所有这些重复的术语,只留下列表中的术语 'carrot' 词典。
期望输出:
repeated_terms = ['potato'] ## identified and stored terms that are repeated in terms_dict.
new_terms_dict = [{'term': 'carrot', 'cui': '222AB'}] ## new dict with the unique term.
想法:
我当然可以创建一个包含独特术语的新词典,但是,我一直坚持实际识别重复的“术语”并将其存储在列表中。
是否有 finding/printing/storing 重复值的 pythonic 方式?
您可以使用 collections.Counter
来完成任务:
from collections import Counter
terms_dict = [
{"term": "potato", "cui": "123AB"},
{"term": "carrot", "cui": "222AB"},
{"term": "potato", "cui": "456AB"},
]
c = Counter(d["term"] for d in terms_dict)
repeated_terms = [k for k, v in c.items() if v > 1]
new_terms_dict = [d for d in terms_dict if c[d["term"]] == 1]
print(repeated_terms)
print(new_terms_dict)
打印:
['potato']
[{'term': 'carrot', 'cui': '222AB'}]
您可以使用 drop_duplicates
和 duplicated
来自 pandas
:
>>> import pandas as pd
>>> df = pd.DataFrame(terms_dict)
>>> df.term[df.term.duplicated()].tolist() # repeats
['potato']
>>> df.drop_duplicates('term', keep=False).to_dict('records') # without repeats
[{'term': 'carrot', 'cui': '222AB'}]
我有一个字典列表,其中重复了一些“术语”值:
terms_dict = [{'term': 'potato', 'cui': '123AB'}, {'term': 'carrot', 'cui': '222AB'}, {'term': 'potato', 'cui': '456AB'}]
如您所见,术语 'potato' 值出现了不止一次。我想将此 'term' 作为变量存储以备将来参考。然后,从 terms_dict
中删除所有这些重复的术语,只留下列表中的术语 'carrot' 词典。
期望输出:
repeated_terms = ['potato'] ## identified and stored terms that are repeated in terms_dict.
new_terms_dict = [{'term': 'carrot', 'cui': '222AB'}] ## new dict with the unique term.
想法:
我当然可以创建一个包含独特术语的新词典,但是,我一直坚持实际识别重复的“术语”并将其存储在列表中。
是否有 finding/printing/storing 重复值的 pythonic 方式?
您可以使用 collections.Counter
来完成任务:
from collections import Counter
terms_dict = [
{"term": "potato", "cui": "123AB"},
{"term": "carrot", "cui": "222AB"},
{"term": "potato", "cui": "456AB"},
]
c = Counter(d["term"] for d in terms_dict)
repeated_terms = [k for k, v in c.items() if v > 1]
new_terms_dict = [d for d in terms_dict if c[d["term"]] == 1]
print(repeated_terms)
print(new_terms_dict)
打印:
['potato']
[{'term': 'carrot', 'cui': '222AB'}]
您可以使用 drop_duplicates
和 duplicated
来自 pandas
:
>>> import pandas as pd
>>> df = pd.DataFrame(terms_dict)
>>> df.term[df.term.duplicated()].tolist() # repeats
['potato']
>>> df.drop_duplicates('term', keep=False).to_dict('records') # without repeats
[{'term': 'carrot', 'cui': '222AB'}]