如何在给定元组列表的情况下创建倒排索引?

How to create an inverted index given a list of tuples?

出于练习的原因,我实现了以下函数 inverted_idx(data),它创建了一个 倒排索引 (从 元组列表开始 ) 其中字典的 keys 是列表中的不同元素,与每个键关联的 value 是索引列表所有具有该键的元组。

功能码为:

def inverted_idx(data):
    rows = []
    dictionary = {}
    for idx, x in enumerate(data):
        rows.append((idx, x))
    for idx, x in rows:
        for key in x:
            if key in dictionary:
                dictionary[key].append(idx)
            else:
                dictionary[key] = [idx]
    return dictionary

通过在元组列表上使用它:

A = [(10, 4, 53), (0, 3, 10), (12, 6, 2), (8, 4, 0)(12, 3, 9)]
inverted_idx (data = A)

结果:

{10: [0, 1],
 4: [0, 3],
 53: [0],
 0: [1, 3],
 3: [1, 4],
 12: [2, 4],
 6: [2],
 2: [2],
 8: [3],
 9: [4]}

功能正常,现在我要做的是修改里面的功能 命令只为元组的那些元素创建倒排索引 占据特定位置。假设我想创建一个 仅针对位置 1.

中的元素的倒排索引

所需的输出将是:

{4: [0, 3]
3: [1, 4]
6: [2]}

如何更改代码以便为给定位置的元素创建倒排索引?

我试过这样做:

def inverted_idx(data):
    rows = []
    dictionary = {}
    for idx, x in enumerate(data):
        rows.append((idx, x))
    for idx, x[1] in rows: # trying to access the element in position 1
        for key in x:
            if key in dictionary:
                dictionary[key].append(idx)
            else:
                dictionary[key] = [idx]
    return dictionary

当然,我得到了以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-79-10c9adaea533> in <module>
      1 A = [(10, 4, 53), (0, 3, 10), (12, 6, 2), (8, 4, 0), (12, 3, 9)]
      2
----> 3 inverted_idx(data = A)

<ipython-input-78-d3f320303057> in inverted_idx(data)
      4     for idx, x in enumerate(data):
      5         rows.append((idx, x))
----> 6     for idx, x[1] in rows:
      7         for key in x:
      8             if key in dictionary:

TypeError: 'tuple' object does not support item assignment

尝试:

A = [(10, 4, 53), (0, 3, 10), (12, 6, 2), (8, 4, 0), (12, 3, 9)]

out = {}
for i, (_, v, _) in enumerate(A):
    out.setdefault(v, []).append(i)

print(out)

打印:

{4: [0, 3], 3: [1, 4], 6: [2]}

我想说 Andrej Kesely 的解决方案是一个较短的版本,我仍然想提交符合您风格的我的版本:

def inverted_idx(data):
    rows = []
    dictionary = {}
    for idx, x in enumerate(data):
        for index, key in enumerate(x):
            if index != 1:
                continue
            if key in dictionary:
                dictionary[key].append(idx)
            else:
                dictionary[key] = [idx]
        rows.append((idx, x))
    return dictionary

Returns 以下:

{3: [1, 4], 4: [0, 3], 6: [2]}

希望这能解决问题。您需要添加索引来枚举数据。