如何通过嵌套元素的键过滤嵌套字典？

Question

我有一个包含源词、目标词及其频率计数的嵌套字典。它看起来像这样： src_tgt_dict = {"each":{"chaque":3}, "in-front-of":{"devant":4}, "next-to":{"à-côté-de":5}, "for":{"pour":7}, "cauliflower":{"chou-fleur":4}, "on":{"sur":2, "panda-et":2}}

我正在尝试过滤字典，以便只保留作为介词（包括多词介词）的键值对。为此，我写了以下内容：

tgt_preps = set(["devant", "pour", "sur", "à"]) #set of initial target prepositions

src_tgt_dict = {"each":{"chaque":3}, "in-front-of":{"devant":4}, "next-to":{"à-côté-de":5}, "for":{"pour":7}, "cauliflower":{"chou-fleur":4}, "on":{"sur":2, "panda-et":2}}

new_tgt_preps = [] #list of new target prepositions

for src, d in src_tgt_dict.items(): #loop into the dictionary
    for tgt, count in d.items(): #loop into the nested dictionary
        check_prep = []
        if "-" in tgt: #check to see if hyphen occurs in the target word (this is to capture multi-word prepositions that are not in the original preposition set)
            check_prep.append(tgt[0:(tgt.index("-"))]) #if there's a hyphen, append the preceding word to the check_prep list
            for t in check_prep: 
                if t in tgt_preps: # check to see if the token preceding the hyphen is a preposition
                    new_tgt_preps.append(tgt) #if yes, append the multi-word preposition to the list of new target prepositions

tgt_preps.update(new_tgt_preps) # update the set of prepositions to include the multi-word prepositions

temp_2_src_tgt_dict = {} # create new dict for filtering
for src, d in src_tgt_dict.items(): # loop into the dictionary
    for tgt, count in d.items(): # loop into the nested dictionary
        if tgt in tgt_preps: # if the target is in the set of target prepositions
            temp_2_src_tgt_dict[tgt] = count # add to the new dict with the tgt as the key and the count as the value

当我打印新的字典时，我得到以下信息：

{'devant': 4, 'pour': 7, 'sur': 2, 'à-côté-de': 5}

我得到它的原因完全有道理，因为那是我告诉机器要做的。但这不是我的本意！

我想要的是：

{"in-front-of:{"devant":4}, "for":{"pour":7}, "on":{"sur":2}, {"next-to":{"à-côté-de":5}}

我试图通过编写来实例化嵌套字典：

temp_2_src_tgt_dict[tgt][src] = count

但这会引发关键错误。

我也试过：

new_tgt_dict = {}
for i in src_tgt_dict.items():  
    for j in tgt_preps:
        if j in list(i[1].keys())[0][:len(j)]:
            new_tgt_dict.update({i[0]: i[1]})

但输出 {'in-front-of': {'devant': 4}, 'next-to': {'à-côté-de': 5}, 'for': {'pour': 7}, 'on': {'sur': 2, 'panda-et': 2}}，格式正确，但不应包含值 'panda-et'，因为当使用 [=19= 更新时，它不会出现在 tgt_preps 中].

任何人都可以提供任何建议或建议吗？预先感谢您的帮助。

Answer 1

也许是这样的：

from collections import defaultdict

new_tgt_dict = defaultdict(dict)
for k, v in src_tgt_dict.items():
  for k1, v1 in v.items():
    k_temp = k1
    if "-" in k1:
      k_temp = k1[0:(k1.index("-"))]
    if k_temp in tgt_preps:
      new_tgt_dict[k].update({k1: v1})
print(dict(new_tgt_dict))

{'in-front-of': {'devant': 4}, 'next-to': {'à-côté-de': 5}, 'for': {'pour': 7}, 'on': {'sur': 2}}

Answer 2

您可以使用 NestedDict。首先安装 ndicts

pip install ndicts

然后

from ndicts.ndicts import NestedDict

tgt_preps = set(["devant", "pour", "sur", "à", "à-côté-de"])  # I added "à-côté-de"

src_tgt_dict = {
    "each": {"chaque": 3}, 
    "in-front-of": {"devant":4}, 
    "next-to": {"à-côté-de": 5}, 
    "for": {"pour": 7}, 
    "cauliflower": {"chou-fleur": 4}, 
    "on": {"sur":2, "panda-et":2}
}

for key, value in nd.copy().items():
    if not set(key) & tgt_preps:
        nd.pop(key)

如果您因此需要字典

result = nd.to_dict()

如何通过嵌套元素的键过滤嵌套字典？

How to filter a nested dict by the key of the nested element?

python

text-processing

dictionary

nested