在 defaultdict 中查找最近的键

Finding closest key in defaultdict

我正在使用 defaultdicts 来存储值列表,其中 keys 是可以观察值的时间段。 从所有感兴趣时期的列表中查找时,我想在我的 defaultdict 中找到最接近的时期(注意:并非所有时期都存储在 defaultdict 中)。

然而,由于 defaultdicts 没有排序,下面的方法没有 return 正确的值。

是否有不同的方法return为 defaultdicts 设置最接近的可用键?

from collections import defaultdict
import numpy as np

def_dict = defaultdict(list)
# entries that will be stored in the defaultdict
reg_dict = {0: ["a", "b"], 2: ["c", "d"], 5: ["k", "h"], -3: ["i", "l"]}

# store items from regular dict in defaultdict 
for k, v in reg_dict.items():
    def_dict[k] = v

# Lookup periods
periods = [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8]

for period in periods:

    # this approach does not return the right keys as defaultdicts are not sorted
    closest_key = np.abs(np.array(list(def_dict.keys())) - period).argmin()

    print("period: ", period, " - looked up key: ", closest_key)

这 return 如下:

period:  -1  - looked up key:  0
period:  0  - looked up key:  0
period:  1  - looked up key:  0
period:  2  - looked up key:  1
period:  3  - looked up key:  1
period:  4  - looked up key:  2
period:  5  - looked up key:  2
period:  6  - looked up key:  2
period:  7  - looked up key:  2
period:  8  - looked up key:  2

据我了解,您想要类似这样的输出?

[0, 0, 0, 2, 2, 5, 5, 5, 5, 5]

以上的逻辑是

closest_key = [min(def_dict.keys(), key = lambda x: abs(x - p)) for p in periods]

为内置 python 函数指定可选的 key 参数在这些情况下很有用。

我同意@septra 的观点,你需要欧距离,但这也可以用 numpy 实现:

from collections import defaultdict
import numpy as np

def_dict = defaultdict(list)
# entries that will be stored in the defaultdict
reg_dict = {0: ["a", "b"], 2: ["c", "d"], 5: ["k", "h"], -3: ["i", "l"]}

# store items from regular dict in defaultdict 
for k, v in reg_dict.items():
    def_dict[k] = v

periods = [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8]
a = list(def_dict.keys())
for period in periods:
    closest_key  = np.sqrt(np.power(np.add(a, -period),2)).argmin()
    # OR closest_key  = np.abs(np.add(a, -period)).argmin()

    print("period: ", period, " - looked up key: ", a[closest_key])

使用 OrderedDict 和排序键,您可以使用二进制搜索。 对于大量键,查找会比您当前的方法快得多。

因为你想要最近的键,你需要找到比 x 低的最右边的键和比 x 高的最左边的键。在为低于 x 的最右边的键找到索引 i 后,另一个候选者(高于 x 的最左边的键)将在索引 i+1.

您需要确保这些索引仍在您的数组中。

最后,您只需计算这 2 个值到 x 的距离。

这是 bisect and np.searchsorted

的文档

正如 Eric 所说,要有效地执行此操作,您应该使用二进制搜索。但是,如果键的数量很少,则简单的线性搜索可能就足够了。无需使用 defaultdict 或 OrderedDict,只需对键进行排序。

import numpy as np

# entries
reg_dict = {0: ["a", "b"], 2: ["c", "d"], 5: ["k", "h"], -3: ["i", "l"]}

keys = np.array(sorted(reg_dict.keys()))
print('keys', keys)

# Lookup periods
periods = np.arange(-1, 9)

for period in periods:
    closest_key = keys[np.abs(keys - period).argmin()]
    print("period: ", period, " - looked up key: ", closest_key)

输出

keys [-3  0  2  5]
period:  -1  - looked up key:  0
period:  0  - looked up key:  0
period:  1  - looked up key:  0
period:  2  - looked up key:  2
period:  3  - looked up key:  2
period:  4  - looked up key:  5
period:  5  - looked up key:  5
period:  6  - looked up key:  5
period:  7  - looked up key:  5
period:  8  - looked up key:  5