如何根据特定数字过滤列表字典的列表行?
How to filter list lines of a dictionary of lists according to a specific number?
我有大量需要过滤的列表字典。
这是其输出的示例:
d = {
'hate': [(2310, "Experiencer: 'like hours'", 212, 222),
(2310, "Experiencer: 'two'", 1035, 1038),
(2310, "Experiencer: 'Anakin'", 1560, 1566),
(2310, "Experiencer: ' '", 1619, 1620),
(2310, "Experiencer: 'Tatooine'", 1726, 1734),
(2310, "Experiencer: 'Anakin'", 1775, 1781),
(2310, "Experiencer: 'Master Qui-Gon'", 1863, 1877),
(2310, "Experiencer: 'half'", 1883, 1887),
(2310, "Experiencer: 'One'", 2114, 2117),
(2310, "Experiencer: 'Anakin'", 2180, 2186),
(2310, "Stimulus: 'One'", 2484, 2487),
(2310, "Stimulus: 'Anakin'", 2564, 2570),
(2310, "Stimulus: 'Padme'", 2739, 2744)],
'confirmation': [(4132, "Experiencer: 'like hours'", 212, 222),
(4132, "Experiencer: 'two'", 1035, 1038),
(4132, "Experiencer: 'Anakin'", 1560, 1566),
(4132, "Experiencer: ' '", 1619, 1620),
(4132, "Experiencer: 'Tatooine'", 1726, 1734),
(4132, "Experiencer: 'Anakin'", 1775, 1781),
(4132, "Experiencer: 'Master Qui-Gon'", 1863, 1877),
(4132, "Experiencer: 'half'", 1883, 1887),
(4132, "Experiencer: 'One'", 2114, 2117),
(4132, "Experiencer: 'Anakin'", 2180, 2186),
(4132, "Experiencer: 'One'", 2484, 2487),
(4132, "Experiencer: 'Anakin'", 2564, 2570),
(4132, "Experiencer: 'Padme'", 2739, 2744),
(4132, "Experiencer: 'Anakin'", 2782, 2788),
(4132, "Experiencer: ' '", 2818, 2819),
(4132, "Experiencer: 'centuries'", 3562, 3571),
(4132, "Experiencer: 'one'", 3585, 3588),
(4132, "Experiencer: 'Anakin'", 3679, 3685),
(4132, "Experiencer: 'Anakin Skywalker'", 3789, 3805),
(4132, "Experiencer: 'Obi-Wan'", 4014, 4021),
(4132, "Experiencer: 'Qui-Gon'", 4025, 4032),
(4132, "Experiencer: 'Qui-Gon's'", 4100, 4109),
(4132, "Stimulus: 'Anakin'", 4281, 4287),
(4132, "Stimulus: ' '", 4355, 4356),
(4132, "Stimulus: 'Anakin'", 4436, 4442)]}
每个键(其中一个是 hate
,如上所述)在每个列表元素的开头都有一个数字。在这里,它是:2310
.
我希望能够打印出列表中的 两个 元素,这些元素的数字最接近该数字,次大,次小.
示例输出:
'hate': [(2310, "Experiencer: 'Anakin'", 2180, 2186),
(2310, "Stimulus: 'One'", 2484, 2487)]
因为
(2310, "Experiencer: 'Anakin'", 2180, 2186)
有数字 2180
,与 2310
相比,它是下一个最小的
并在 return 中:
(2310, "Stimulus: 'One'", 2484, 2487)
有数字 2484
,与 2310
相比,它是第二大的
我猜这需要一个 for
循环?我如何遍历列表字典,将第一个自重复数字与每行的第一个数字进行比较,然后 return 最接近的数字,如上所述?
我希望我的问题足够容易理解...
提前致谢!
编辑:
目标是自动处理字典,并通过过滤更新它。
该字典的期望输出将是这样的:
d = {
'hate': [(2310, "Experiencer: 'Anakin'", 2180, 2186),
(2310, "Stimulus: 'One'", 2484, 2487)],
'confirmation': [(4132, "Experiencer: 'Qui-Gon's'", 4100, 4109),
(4132, "Stimulus: 'Anakin'", 4281, 4287)],
...}
我还编辑了上面的输出示例。这是一个列表字典
也许最简单的方法是
- 根据感兴趣的元素对列表进行排序,位置 2。
- 使用二进制搜索查找
2310
将出现在该列表中的位置;你应该在两个元素之间。这就是你想要的两个元素。
使用itertools.groupby
将每个列表中的所有相似元素分组,然后对它们进行排序(基于绝对差)并获得前2个元素
>>> from itertools import groupby
>>>
>>> f = lambda t: t[0]
>>> {key:sorted(v, key=lambda t: abs(k-t[3]))[:2] for key,lst in d.items() for k,v in groupby(sorted(lst, key=f), f)}
{'confirmation': [(4132, "Experiencer: 'Qui-Gon's'", 4100, 4109), (4132, "Experiencer: 'Qui-Gon'", 4025, 4032)], 'hate': [(2310, "Experiencer: 'Anakin'", 2180, 2186), (2310, "Stimulus: 'One'", 2484, 2487)]}
这是最普通、最简单的方法(我能想到)。下面有一个使用 itertools
的解决方案,它更优雅,但对于新手来说更难理解。
如果 l
是 hate
在你的字典中指向的列表:
target_num = l[0][0]
closest_smaller, closest_bigger = 0,0
closest_smaller_diff, closest_bigger_diff = float("inf"), float("-inf")
for element in l:
for num in (l[-2],l[-1]):
diff = target_num - num
if diff > 0 and diff < closest_smaller_diff:
closest_smaller = num
closest_smaller_diff = diff
if diff < 0 and diff > closest_bigger_diff:
closest_bigger = num
closest_bigger_diff = diff
print(closest_smaller, closest_bigger)
# let big_dict be the big list you start with
output_dict = {}
for key, value in big_dict.items():
# break the list into two lists, for those with third value greater
# and those with third value lesser/equal
higher_tuples = [i for i in value if i[2] > i[0]]
lower_tuples = [i for i in value if i[2] <= i[0]]
# Get the values from that list with
high_closest = min(higher_tuples, key=lambda x: x[2] - x[0])
low_closest = min(lower_tuples, key=lambda x: x[0] - x[2])
# bind them into an output
output_duct[key] = [high_closest, low_closest]
如果您愿意,可以将它们全部装订在一个非常大的单行纸中:
output_dict = {key: [min([i for i in value if i[2] > i[0]], key=lambda a: a[2] - a[0]), min([j for j in value if j[0] <= j[2]], key=lambda b: b[0] - b[2])] for key, value in big_dict.items()}
如果您的列表已经排序,我们可以使用 bisect
找到 "Experiencer" 和 "Status" 条目之间的位置:
from bisect import bisect
l=[(2310, "Experiencer: 'like hours'", 212, 222), (2310, "Experiencer: 'two'", 1035,1038), (2310, "Experiencer: 'Anakin'", 1560, 1566), (2310, "Experiencer: ' '", 1619, 1620), (2310, "Experiencer: 'Tatooine'", 1726, 1734), (2310, "Experiencer: 'Anakin'", 1775, 1781), (2310, "Experiencer: 'Master Qui-Gon'", 1863, 1877), (2310, "Experiencer: 'half'", 1883, 1887), (2310, "Experiencer: 'One'", 2114, 2117), (2310, "Experiencer: 'Anakin'", 2180, 2186), (2310, "Stimulus: 'One'", 2484, 2487), (2310, "Stimulus: 'Anakin'", 2564, 2570), (2310, "Stimulus: 'Padme'", 2739, 2744)]
right_index = bisect(l, (2310, "F")) # "F" comes between "Experiencer" and "Status"
lower, higher = l[right_index-1], l[right_index]
print(lower, higher, sep="\n")
# (2310, "Experiencer: 'Anakin'", 2180, 2186)
# (2310, "Stimulus: 'One'", 2484, 2487)
然后你就可以很容易地处理你的字典了
from bisect import bisect
def get_boundary(l): # This assumes all lists in your dict have at least 2 items
if len(l) < 2:
return l
right_index = bisect(l, (l[0][0], "F"))
return [l[right_index-1], l[right_index]]
print({key: get_boundary(value) for key, value in d.items()})
生产
{'hate': [(2310, "Experiencer: 'Anakin'", 2180, 2186),
(2310, "Stimulus: 'One'", 2484, 2487)],
'confirmation': [(4132, "Experiencer: 'Qui-Gon's'", 4100, 4109),
(4132, "Stimulus: 'Anakin'", 4281, 4287)]
}
我有大量需要过滤的列表字典。 这是其输出的示例:
d = {
'hate': [(2310, "Experiencer: 'like hours'", 212, 222),
(2310, "Experiencer: 'two'", 1035, 1038),
(2310, "Experiencer: 'Anakin'", 1560, 1566),
(2310, "Experiencer: ' '", 1619, 1620),
(2310, "Experiencer: 'Tatooine'", 1726, 1734),
(2310, "Experiencer: 'Anakin'", 1775, 1781),
(2310, "Experiencer: 'Master Qui-Gon'", 1863, 1877),
(2310, "Experiencer: 'half'", 1883, 1887),
(2310, "Experiencer: 'One'", 2114, 2117),
(2310, "Experiencer: 'Anakin'", 2180, 2186),
(2310, "Stimulus: 'One'", 2484, 2487),
(2310, "Stimulus: 'Anakin'", 2564, 2570),
(2310, "Stimulus: 'Padme'", 2739, 2744)],
'confirmation': [(4132, "Experiencer: 'like hours'", 212, 222),
(4132, "Experiencer: 'two'", 1035, 1038),
(4132, "Experiencer: 'Anakin'", 1560, 1566),
(4132, "Experiencer: ' '", 1619, 1620),
(4132, "Experiencer: 'Tatooine'", 1726, 1734),
(4132, "Experiencer: 'Anakin'", 1775, 1781),
(4132, "Experiencer: 'Master Qui-Gon'", 1863, 1877),
(4132, "Experiencer: 'half'", 1883, 1887),
(4132, "Experiencer: 'One'", 2114, 2117),
(4132, "Experiencer: 'Anakin'", 2180, 2186),
(4132, "Experiencer: 'One'", 2484, 2487),
(4132, "Experiencer: 'Anakin'", 2564, 2570),
(4132, "Experiencer: 'Padme'", 2739, 2744),
(4132, "Experiencer: 'Anakin'", 2782, 2788),
(4132, "Experiencer: ' '", 2818, 2819),
(4132, "Experiencer: 'centuries'", 3562, 3571),
(4132, "Experiencer: 'one'", 3585, 3588),
(4132, "Experiencer: 'Anakin'", 3679, 3685),
(4132, "Experiencer: 'Anakin Skywalker'", 3789, 3805),
(4132, "Experiencer: 'Obi-Wan'", 4014, 4021),
(4132, "Experiencer: 'Qui-Gon'", 4025, 4032),
(4132, "Experiencer: 'Qui-Gon's'", 4100, 4109),
(4132, "Stimulus: 'Anakin'", 4281, 4287),
(4132, "Stimulus: ' '", 4355, 4356),
(4132, "Stimulus: 'Anakin'", 4436, 4442)]}
每个键(其中一个是 hate
,如上所述)在每个列表元素的开头都有一个数字。在这里,它是:2310
.
我希望能够打印出列表中的 两个 元素,这些元素的数字最接近该数字,次大,次小.
示例输出:
'hate': [(2310, "Experiencer: 'Anakin'", 2180, 2186),
(2310, "Stimulus: 'One'", 2484, 2487)]
因为
(2310, "Experiencer: 'Anakin'", 2180, 2186)
有数字 2180
,与 2310
并在 return 中:
(2310, "Stimulus: 'One'", 2484, 2487)
有数字 2484
,与 2310
我猜这需要一个 for
循环?我如何遍历列表字典,将第一个自重复数字与每行的第一个数字进行比较,然后 return 最接近的数字,如上所述?
我希望我的问题足够容易理解...
提前致谢!
编辑:
目标是自动处理字典,并通过过滤更新它。
该字典的期望输出将是这样的:
d = {
'hate': [(2310, "Experiencer: 'Anakin'", 2180, 2186),
(2310, "Stimulus: 'One'", 2484, 2487)],
'confirmation': [(4132, "Experiencer: 'Qui-Gon's'", 4100, 4109),
(4132, "Stimulus: 'Anakin'", 4281, 4287)],
...}
我还编辑了上面的输出示例。这是一个列表字典
也许最简单的方法是
- 根据感兴趣的元素对列表进行排序,位置 2。
- 使用二进制搜索查找
2310
将出现在该列表中的位置;你应该在两个元素之间。这就是你想要的两个元素。
使用itertools.groupby
将每个列表中的所有相似元素分组,然后对它们进行排序(基于绝对差)并获得前2个元素
>>> from itertools import groupby
>>>
>>> f = lambda t: t[0]
>>> {key:sorted(v, key=lambda t: abs(k-t[3]))[:2] for key,lst in d.items() for k,v in groupby(sorted(lst, key=f), f)}
{'confirmation': [(4132, "Experiencer: 'Qui-Gon's'", 4100, 4109), (4132, "Experiencer: 'Qui-Gon'", 4025, 4032)], 'hate': [(2310, "Experiencer: 'Anakin'", 2180, 2186), (2310, "Stimulus: 'One'", 2484, 2487)]}
这是最普通、最简单的方法(我能想到)。下面有一个使用 itertools
的解决方案,它更优雅,但对于新手来说更难理解。
如果 l
是 hate
在你的字典中指向的列表:
target_num = l[0][0]
closest_smaller, closest_bigger = 0,0
closest_smaller_diff, closest_bigger_diff = float("inf"), float("-inf")
for element in l:
for num in (l[-2],l[-1]):
diff = target_num - num
if diff > 0 and diff < closest_smaller_diff:
closest_smaller = num
closest_smaller_diff = diff
if diff < 0 and diff > closest_bigger_diff:
closest_bigger = num
closest_bigger_diff = diff
print(closest_smaller, closest_bigger)
# let big_dict be the big list you start with
output_dict = {}
for key, value in big_dict.items():
# break the list into two lists, for those with third value greater
# and those with third value lesser/equal
higher_tuples = [i for i in value if i[2] > i[0]]
lower_tuples = [i for i in value if i[2] <= i[0]]
# Get the values from that list with
high_closest = min(higher_tuples, key=lambda x: x[2] - x[0])
low_closest = min(lower_tuples, key=lambda x: x[0] - x[2])
# bind them into an output
output_duct[key] = [high_closest, low_closest]
如果您愿意,可以将它们全部装订在一个非常大的单行纸中:
output_dict = {key: [min([i for i in value if i[2] > i[0]], key=lambda a: a[2] - a[0]), min([j for j in value if j[0] <= j[2]], key=lambda b: b[0] - b[2])] for key, value in big_dict.items()}
如果您的列表已经排序,我们可以使用 bisect
找到 "Experiencer" 和 "Status" 条目之间的位置:
from bisect import bisect
l=[(2310, "Experiencer: 'like hours'", 212, 222), (2310, "Experiencer: 'two'", 1035,1038), (2310, "Experiencer: 'Anakin'", 1560, 1566), (2310, "Experiencer: ' '", 1619, 1620), (2310, "Experiencer: 'Tatooine'", 1726, 1734), (2310, "Experiencer: 'Anakin'", 1775, 1781), (2310, "Experiencer: 'Master Qui-Gon'", 1863, 1877), (2310, "Experiencer: 'half'", 1883, 1887), (2310, "Experiencer: 'One'", 2114, 2117), (2310, "Experiencer: 'Anakin'", 2180, 2186), (2310, "Stimulus: 'One'", 2484, 2487), (2310, "Stimulus: 'Anakin'", 2564, 2570), (2310, "Stimulus: 'Padme'", 2739, 2744)]
right_index = bisect(l, (2310, "F")) # "F" comes between "Experiencer" and "Status"
lower, higher = l[right_index-1], l[right_index]
print(lower, higher, sep="\n")
# (2310, "Experiencer: 'Anakin'", 2180, 2186)
# (2310, "Stimulus: 'One'", 2484, 2487)
然后你就可以很容易地处理你的字典了
from bisect import bisect
def get_boundary(l): # This assumes all lists in your dict have at least 2 items
if len(l) < 2:
return l
right_index = bisect(l, (l[0][0], "F"))
return [l[right_index-1], l[right_index]]
print({key: get_boundary(value) for key, value in d.items()})
生产
{'hate': [(2310, "Experiencer: 'Anakin'", 2180, 2186),
(2310, "Stimulus: 'One'", 2484, 2487)],
'confirmation': [(4132, "Experiencer: 'Qui-Gon's'", 4100, 4109),
(4132, "Stimulus: 'Anakin'", 4281, 4287)]
}