从参考列表中过滤范围内的元组

Filtering tuples within a range from a reference list

我有一个包含不同值范围的元组的参考列表。

[(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690), 
 (667, 690), (667, 690), (479, 508), (1112, 1578)]

我有以下列表列表,其中包含必须与参考列表进行比较的值元组。

[  [(450,470)],
   [(100, 200), (500, 700)],
   [(0, 29), (3827, 3856)],
   [(820, 835), (1539, 1554)],
   [(622, 635), (1286, 1299), (1585, 1598), (1607, 1620)],
   [(637, 642), (780, 785), (1341, 1346), (1944, 1949), (2044, 2049),
    (2158, 2163), (2594, 2599), (2643, 2648)]  ]

我正在尝试从每个列表中选择一个元组,该列表在参考列表中存在的元组范围内。

我考虑的条件是:

  1. 如果输入列表中没有取值在引用列表范围内的元组,则可以取任意元组。例如 [(0, 29), (3827, 3856)] 不在引用列表的范围内,所以我可以采用任何元组。默认情况下,我将列表中的第一个元组附加到引用列表。

  2. 如果找到引用列表范围内的元组,则将该元组附加到引用列表并停止在该循环中搜索。例如 [(622, 635), (1286, 1299), (1585, 1598), (1607, 1620)]

  3. 如果引用列表范围内还存在多个元组,则将第一个找到的元组附加到引用列表。例如 [(637, 642), (780, 785), (1341, 1346), (1944, 1949), (2044, 2049), (2158, 2163), (2594, 2599), (2643, 2648)]

  4. 元组中的值永远不会相同,元组中的第二个值将始终大于第一个值。

我用来查找范围的逻辑是我在引用列表的元组的第一个位置取最小值和最大值。我做了简单的迭代。

我使用的代码是

tag_pos_refin = [(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690), 
                 (667, 690), (667, 690), (479, 508), (1112, 1578)]

tag_pos_db = [  [(450,470)],
                [(100, 200), (500, 700)],
                [(0, 29), (3827, 3856)],
                [(820, 835), (1539, 1554)],
                [(622, 635), (1286, 1299), (1585, 1598), (1607, 1620)],
                [(637, 642), (780, 785), (1341, 1346), (1944, 1949), (2044, 2049), (2158, 2163), 
                  (2594, 2599), (2643, 2648)]
            ]


min_threshold = min(tag_pos_refin)[0]
max_threshold = max(tag_pos_refin)[0]

for tag_pos in tag_pos_db:
    if len(tag_pos) == 1:
        tag_pos_refin.extend(tag_pos)

for tag_pos in tag_pos_db:
    if len(tag_pos) > 1:
        for j in tag_pos:
            if j[0] in range(min_threshold, max_threshold):
                tag_pos_refin.append(j)
                break
            elif min(tag_pos)[0] not in range(min_threshold, max_threshold):
                tag_pos_refin.append(j)
                break             

print(tag_pos_refin)

获得的输出

[(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690), (667, 690), (667, 690), (479, 508), (1112, 1578), (450, 470), (100, 200), (0, 29), (820, 835), (622, 635), (637, 642)]

期望输出

[(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690), (667, 690), (667, 690), (479, 508), (1112, 1578), (450, 470), (500, 700), (0, 29), (820, 835), (622, 635), (637, 642)]

我的疑惑是

是否有可能以更好的方式或更好的逻辑编写代码来查找范围,以便最好的元组是 (500,700).

而不是 (100,200)

(这个用例解释起来有点复杂:但是元组的值可以被认为是文本中单词或句子的索引点)

您的代码存在许多问题。首先,您需要检查每个元组的 both 值,因为任何一个都可能在范围内,但不一定 both。其次,不断地 re-create 一个 range 对象来进行简单的边界检查是低效的,而且你的实现也有一个 off-by-one 错误(假设你想要一个 inclusive范围)。第三,您在查找匹配项时不会检查所有元组,这意味着可能会附加错误的元组。

在下面的解决方案中,我添加了一些额外的测试来检查边界情况:

tag_pos_refin = [(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690),
                 (667, 690), (667, 690), (479, 508), (1112, 1578)]

tag_pos_db = [
    [(200, 1500), (1112, 1200)], # test upper bound
    [(100, 1600), (275, 479)], # test lower bound
    [(450, 470)],
    [(100, 200), (500, 700)],
    [(0, 29), (3827, 3856)],
    [(820, 835), (1539, 1554)],
    [(622, 635), (1286, 1299), (1585, 1598), (1607, 1620)],
    [(637, 642), (780, 785), (1341, 1346), (1944, 1949), (2044, 2049), (2158, 2163),
     (2594, 2599), (2643, 2648)],
    ]

min_threshold = min(tag_pos_refin)[0]
max_threshold = max(tag_pos_refin)[0]

print(f'min/max: {min_threshold}-{max_threshold}\n')

for tag_pos in tag_pos_db:
    if tag_pos:
        print(f'checking {tag_pos}', end='')
        for j in tag_pos:
            if (min_threshold <= j[0] <= max_threshold or
                min_threshold <= j[1] <= max_threshold):
                print(' -> found match')
                tag_pos_refin.append(j)
                break
        else:
            print(' -> no matches')
            tag_pos_refin.append(tag_pos[0])
        print(f'APPENDED: {tag_pos_refin[-1]}\n')

print(f'RESULT: {tag_pos_refin}\n')
    

输出:

min/max: 479-1112

checking [(200, 1500), (1112, 1200)] -> found match
APPENDED: (1112, 1200)

checking [(100, 1600), (275, 479)] -> found match
APPENDED: (275, 479)

checking [(450, 470)] -> no matches
APPENDED: (450, 470)

checking [(100, 200), (500, 700)] -> found match
APPENDED: (500, 700)

checking [(0, 29), (3827, 3856)] -> no matches
APPENDED: (0, 29)

checking [(820, 835), (1539, 1554)] -> found match
APPENDED: (820, 835)

checking [(622, 635), (1286, 1299), (1585, 1598), (1607, 1620)] -> found match
APPENDED: (622, 635)

checking [(637, 642), (780, 785), (1341, 1346), (1944, 1949), (2044, 2049), (2158, 2163), (2594, 2599), (2643, 2648)] -> found match
APPENDED: (637, 642)

RESULT: [(1042, 1056), (895, 922), (966, 995), (692, 716), (667, 690), (667, 690), (667, 690), (479, 508), (1112, 1578), (1112, 1200), (275, 479), (450, 470), (500, 700), (0, 29), (820, 835), (622, 635), (637, 642)]