在 defaultdict 中查找最近的键
Finding closest key in defaultdict
我正在使用 defaultdicts 来存储值列表,其中 keys
是可以观察值的时间段。
从所有感兴趣时期的列表中查找时,我想在我的 defaultdict 中找到最接近的时期(注意:并非所有时期都存储在 defaultdict 中)。
然而,由于 defaultdicts 没有排序,下面的方法没有 return 正确的值。
是否有不同的方法return为 defaultdicts 设置最接近的可用键?
from collections import defaultdict
import numpy as np
def_dict = defaultdict(list)
# entries that will be stored in the defaultdict
reg_dict = {0: ["a", "b"], 2: ["c", "d"], 5: ["k", "h"], -3: ["i", "l"]}
# store items from regular dict in defaultdict
for k, v in reg_dict.items():
def_dict[k] = v
# Lookup periods
periods = [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8]
for period in periods:
# this approach does not return the right keys as defaultdicts are not sorted
closest_key = np.abs(np.array(list(def_dict.keys())) - period).argmin()
print("period: ", period, " - looked up key: ", closest_key)
这 return 如下:
period: -1 - looked up key: 0
period: 0 - looked up key: 0
period: 1 - looked up key: 0
period: 2 - looked up key: 1
period: 3 - looked up key: 1
period: 4 - looked up key: 2
period: 5 - looked up key: 2
period: 6 - looked up key: 2
period: 7 - looked up key: 2
period: 8 - looked up key: 2
据我了解,您想要类似这样的输出?
[0, 0, 0, 2, 2, 5, 5, 5, 5, 5]
以上的逻辑是
closest_key = [min(def_dict.keys(), key = lambda x: abs(x - p)) for p in periods]
为内置 python 函数指定可选的 key
参数在这些情况下很有用。
我同意@septra 的观点,你需要欧距离,但这也可以用 numpy 实现:
from collections import defaultdict
import numpy as np
def_dict = defaultdict(list)
# entries that will be stored in the defaultdict
reg_dict = {0: ["a", "b"], 2: ["c", "d"], 5: ["k", "h"], -3: ["i", "l"]}
# store items from regular dict in defaultdict
for k, v in reg_dict.items():
def_dict[k] = v
periods = [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8]
a = list(def_dict.keys())
for period in periods:
closest_key = np.sqrt(np.power(np.add(a, -period),2)).argmin()
# OR closest_key = np.abs(np.add(a, -period)).argmin()
print("period: ", period, " - looked up key: ", a[closest_key])
使用 OrderedDict
和排序键,您可以使用二进制搜索。
对于大量键,查找会比您当前的方法快得多。
因为你想要最近的键,你需要找到比 x 低的最右边的键和比 x 高的最左边的键。在为低于 x 的最右边的键找到索引 i
后,另一个候选者(高于 x 的最左边的键)将在索引 i+1
.
上
您需要确保这些索引仍在您的数组中。
最后,您只需计算这 2 个值到 x 的距离。
这是 bisect
and np.searchsorted
的文档
正如 Eric 所说,要有效地执行此操作,您应该使用二进制搜索。但是,如果键的数量很少,则简单的线性搜索可能就足够了。无需使用 defaultdict 或 OrderedDict,只需对键进行排序。
import numpy as np
# entries
reg_dict = {0: ["a", "b"], 2: ["c", "d"], 5: ["k", "h"], -3: ["i", "l"]}
keys = np.array(sorted(reg_dict.keys()))
print('keys', keys)
# Lookup periods
periods = np.arange(-1, 9)
for period in periods:
closest_key = keys[np.abs(keys - period).argmin()]
print("period: ", period, " - looked up key: ", closest_key)
输出
keys [-3 0 2 5]
period: -1 - looked up key: 0
period: 0 - looked up key: 0
period: 1 - looked up key: 0
period: 2 - looked up key: 2
period: 3 - looked up key: 2
period: 4 - looked up key: 5
period: 5 - looked up key: 5
period: 6 - looked up key: 5
period: 7 - looked up key: 5
period: 8 - looked up key: 5
我正在使用 defaultdicts 来存储值列表,其中 keys
是可以观察值的时间段。
从所有感兴趣时期的列表中查找时,我想在我的 defaultdict 中找到最接近的时期(注意:并非所有时期都存储在 defaultdict 中)。
然而,由于 defaultdicts 没有排序,下面的方法没有 return 正确的值。
是否有不同的方法return为 defaultdicts 设置最接近的可用键?
from collections import defaultdict
import numpy as np
def_dict = defaultdict(list)
# entries that will be stored in the defaultdict
reg_dict = {0: ["a", "b"], 2: ["c", "d"], 5: ["k", "h"], -3: ["i", "l"]}
# store items from regular dict in defaultdict
for k, v in reg_dict.items():
def_dict[k] = v
# Lookup periods
periods = [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8]
for period in periods:
# this approach does not return the right keys as defaultdicts are not sorted
closest_key = np.abs(np.array(list(def_dict.keys())) - period).argmin()
print("period: ", period, " - looked up key: ", closest_key)
这 return 如下:
period: -1 - looked up key: 0
period: 0 - looked up key: 0
period: 1 - looked up key: 0
period: 2 - looked up key: 1
period: 3 - looked up key: 1
period: 4 - looked up key: 2
period: 5 - looked up key: 2
period: 6 - looked up key: 2
period: 7 - looked up key: 2
period: 8 - looked up key: 2
据我了解,您想要类似这样的输出?
[0, 0, 0, 2, 2, 5, 5, 5, 5, 5]
以上的逻辑是
closest_key = [min(def_dict.keys(), key = lambda x: abs(x - p)) for p in periods]
为内置 python 函数指定可选的 key
参数在这些情况下很有用。
我同意@septra 的观点,你需要欧距离,但这也可以用 numpy 实现:
from collections import defaultdict
import numpy as np
def_dict = defaultdict(list)
# entries that will be stored in the defaultdict
reg_dict = {0: ["a", "b"], 2: ["c", "d"], 5: ["k", "h"], -3: ["i", "l"]}
# store items from regular dict in defaultdict
for k, v in reg_dict.items():
def_dict[k] = v
periods = [-1, 0, 1, 2, 3, 4, 5, 6, 7, 8]
a = list(def_dict.keys())
for period in periods:
closest_key = np.sqrt(np.power(np.add(a, -period),2)).argmin()
# OR closest_key = np.abs(np.add(a, -period)).argmin()
print("period: ", period, " - looked up key: ", a[closest_key])
使用 OrderedDict
和排序键,您可以使用二进制搜索。
对于大量键,查找会比您当前的方法快得多。
因为你想要最近的键,你需要找到比 x 低的最右边的键和比 x 高的最左边的键。在为低于 x 的最右边的键找到索引 i
后,另一个候选者(高于 x 的最左边的键)将在索引 i+1
.
您需要确保这些索引仍在您的数组中。
最后,您只需计算这 2 个值到 x 的距离。
这是 bisect
and np.searchsorted
正如 Eric 所说,要有效地执行此操作,您应该使用二进制搜索。但是,如果键的数量很少,则简单的线性搜索可能就足够了。无需使用 defaultdict 或 OrderedDict,只需对键进行排序。
import numpy as np
# entries
reg_dict = {0: ["a", "b"], 2: ["c", "d"], 5: ["k", "h"], -3: ["i", "l"]}
keys = np.array(sorted(reg_dict.keys()))
print('keys', keys)
# Lookup periods
periods = np.arange(-1, 9)
for period in periods:
closest_key = keys[np.abs(keys - period).argmin()]
print("period: ", period, " - looked up key: ", closest_key)
输出
keys [-3 0 2 5]
period: -1 - looked up key: 0
period: 0 - looked up key: 0
period: 1 - looked up key: 0
period: 2 - looked up key: 2
period: 3 - looked up key: 2
period: 4 - looked up key: 5
period: 5 - looked up key: 5
period: 6 - looked up key: 5
period: 7 - looked up key: 5
period: 8 - looked up key: 5