如何通过嵌套元素的键过滤嵌套字典?
How to filter a nested dict by the key of the nested element?
我有一个包含源词、目标词及其频率计数的嵌套字典。它看起来像这样:
src_tgt_dict = {"each":{"chaque":3}, "in-front-of":{"devant":4}, "next-to":{"à-côté-de":5}, "for":{"pour":7}, "cauliflower":{"chou-fleur":4}, "on":{"sur":2, "panda-et":2}}
我正在尝试过滤字典,以便只保留作为介词(包括多词介词)的键值对。为此,我写了以下内容:
tgt_preps = set(["devant", "pour", "sur", "à"]) #set of initial target prepositions
src_tgt_dict = {"each":{"chaque":3}, "in-front-of":{"devant":4}, "next-to":{"à-côté-de":5}, "for":{"pour":7}, "cauliflower":{"chou-fleur":4}, "on":{"sur":2, "panda-et":2}}
new_tgt_preps = [] #list of new target prepositions
for src, d in src_tgt_dict.items(): #loop into the dictionary
for tgt, count in d.items(): #loop into the nested dictionary
check_prep = []
if "-" in tgt: #check to see if hyphen occurs in the target word (this is to capture multi-word prepositions that are not in the original preposition set)
check_prep.append(tgt[0:(tgt.index("-"))]) #if there's a hyphen, append the preceding word to the check_prep list
for t in check_prep:
if t in tgt_preps: # check to see if the token preceding the hyphen is a preposition
new_tgt_preps.append(tgt) #if yes, append the multi-word preposition to the list of new target prepositions
tgt_preps.update(new_tgt_preps) # update the set of prepositions to include the multi-word prepositions
temp_2_src_tgt_dict = {} # create new dict for filtering
for src, d in src_tgt_dict.items(): # loop into the dictionary
for tgt, count in d.items(): # loop into the nested dictionary
if tgt in tgt_preps: # if the target is in the set of target prepositions
temp_2_src_tgt_dict[tgt] = count # add to the new dict with the tgt as the key and the count as the value
当我打印新的字典时,我得到以下信息:
{'devant': 4, 'pour': 7, 'sur': 2, 'à-côté-de': 5}
我得到它的原因完全有道理,因为那是我告诉机器要做的。但这不是我的本意!
我想要的是:
{"in-front-of:{"devant":4}, "for":{"pour":7}, "on":{"sur":2}, {"next-to":{"à-côté-de":5}}
我试图通过编写来实例化嵌套字典:
temp_2_src_tgt_dict[tgt][src] = count
但这会引发关键错误。
我也试过:
new_tgt_dict = {}
for i in src_tgt_dict.items():
for j in tgt_preps:
if j in list(i[1].keys())[0][:len(j)]:
new_tgt_dict.update({i[0]: i[1]})
但输出 {'in-front-of': {'devant': 4}, 'next-to': {'à-côté-de': 5}, 'for': {'pour': 7}, 'on': {'sur': 2, 'panda-et': 2}}
,格式正确,但不应包含值 'panda-et'
,因为当使用 [=19= 更新时,它不会出现在 tgt_preps
中].
任何人都可以提供任何建议或建议吗?预先感谢您的帮助。
也许是这样的:
from collections import defaultdict
new_tgt_dict = defaultdict(dict)
for k, v in src_tgt_dict.items():
for k1, v1 in v.items():
k_temp = k1
if "-" in k1:
k_temp = k1[0:(k1.index("-"))]
if k_temp in tgt_preps:
new_tgt_dict[k].update({k1: v1})
print(dict(new_tgt_dict))
{'in-front-of': {'devant': 4}, 'next-to': {'à-côté-de': 5}, 'for': {'pour': 7}, 'on': {'sur': 2}}
您可以使用 NestedDict
。首先安装 ndicts
pip install ndicts
然后
from ndicts.ndicts import NestedDict
tgt_preps = set(["devant", "pour", "sur", "à", "à-côté-de"]) # I added "à-côté-de"
src_tgt_dict = {
"each": {"chaque": 3},
"in-front-of": {"devant":4},
"next-to": {"à-côté-de": 5},
"for": {"pour": 7},
"cauliflower": {"chou-fleur": 4},
"on": {"sur":2, "panda-et":2}
}
for key, value in nd.copy().items():
if not set(key) & tgt_preps:
nd.pop(key)
如果您因此需要字典
result = nd.to_dict()
我有一个包含源词、目标词及其频率计数的嵌套字典。它看起来像这样:
src_tgt_dict = {"each":{"chaque":3}, "in-front-of":{"devant":4}, "next-to":{"à-côté-de":5}, "for":{"pour":7}, "cauliflower":{"chou-fleur":4}, "on":{"sur":2, "panda-et":2}}
我正在尝试过滤字典,以便只保留作为介词(包括多词介词)的键值对。为此,我写了以下内容:
tgt_preps = set(["devant", "pour", "sur", "à"]) #set of initial target prepositions
src_tgt_dict = {"each":{"chaque":3}, "in-front-of":{"devant":4}, "next-to":{"à-côté-de":5}, "for":{"pour":7}, "cauliflower":{"chou-fleur":4}, "on":{"sur":2, "panda-et":2}}
new_tgt_preps = [] #list of new target prepositions
for src, d in src_tgt_dict.items(): #loop into the dictionary
for tgt, count in d.items(): #loop into the nested dictionary
check_prep = []
if "-" in tgt: #check to see if hyphen occurs in the target word (this is to capture multi-word prepositions that are not in the original preposition set)
check_prep.append(tgt[0:(tgt.index("-"))]) #if there's a hyphen, append the preceding word to the check_prep list
for t in check_prep:
if t in tgt_preps: # check to see if the token preceding the hyphen is a preposition
new_tgt_preps.append(tgt) #if yes, append the multi-word preposition to the list of new target prepositions
tgt_preps.update(new_tgt_preps) # update the set of prepositions to include the multi-word prepositions
temp_2_src_tgt_dict = {} # create new dict for filtering
for src, d in src_tgt_dict.items(): # loop into the dictionary
for tgt, count in d.items(): # loop into the nested dictionary
if tgt in tgt_preps: # if the target is in the set of target prepositions
temp_2_src_tgt_dict[tgt] = count # add to the new dict with the tgt as the key and the count as the value
当我打印新的字典时,我得到以下信息:
{'devant': 4, 'pour': 7, 'sur': 2, 'à-côté-de': 5}
我得到它的原因完全有道理,因为那是我告诉机器要做的。但这不是我的本意!
我想要的是:
{"in-front-of:{"devant":4}, "for":{"pour":7}, "on":{"sur":2}, {"next-to":{"à-côté-de":5}}
我试图通过编写来实例化嵌套字典:
temp_2_src_tgt_dict[tgt][src] = count
但这会引发关键错误。
我也试过:
new_tgt_dict = {}
for i in src_tgt_dict.items():
for j in tgt_preps:
if j in list(i[1].keys())[0][:len(j)]:
new_tgt_dict.update({i[0]: i[1]})
但输出 {'in-front-of': {'devant': 4}, 'next-to': {'à-côté-de': 5}, 'for': {'pour': 7}, 'on': {'sur': 2, 'panda-et': 2}}
,格式正确,但不应包含值 'panda-et'
,因为当使用 [=19= 更新时,它不会出现在 tgt_preps
中].
任何人都可以提供任何建议或建议吗?预先感谢您的帮助。
也许是这样的:
from collections import defaultdict
new_tgt_dict = defaultdict(dict)
for k, v in src_tgt_dict.items():
for k1, v1 in v.items():
k_temp = k1
if "-" in k1:
k_temp = k1[0:(k1.index("-"))]
if k_temp in tgt_preps:
new_tgt_dict[k].update({k1: v1})
print(dict(new_tgt_dict))
{'in-front-of': {'devant': 4}, 'next-to': {'à-côté-de': 5}, 'for': {'pour': 7}, 'on': {'sur': 2}}
您可以使用 NestedDict
。首先安装 ndicts
pip install ndicts
然后
from ndicts.ndicts import NestedDict
tgt_preps = set(["devant", "pour", "sur", "à", "à-côté-de"]) # I added "à-côté-de"
src_tgt_dict = {
"each": {"chaque": 3},
"in-front-of": {"devant":4},
"next-to": {"à-côté-de": 5},
"for": {"pour": 7},
"cauliflower": {"chou-fleur": 4},
"on": {"sur":2, "panda-et":2}
}
for key, value in nd.copy().items():
if not set(key) & tgt_preps:
nd.pop(key)
如果您因此需要字典
result = nd.to_dict()