反向多重索引
The Inverted Multi-Index
我试图从这个 paper, which has also a smaller version here 中理解倒置多指数。为此,我构建了一个玩具示例并希望有人验证 or/and 与我分享 his/her 意见。
例子:
Assume we have N = 6 points in M = 4 dimensions. We use two blocks to
create sub-vecrtors. Let the points be these:
p0 = (0, 0, 1, 1)
p1 = (2, 2, 3, 3)
p2 = (4, 4, 5, 5)
p3 = (6, 6, 7, 7)
p4 = (8, 8, 9, 9)
p5 = (10, 10, 11, 11) // p5^1 = (10, 10), which is appended in D^1 etc.
_________________________________________________________________________
We run k-means twice, once for D^1 and once for D^2, by requiring 3
centroids and we get:
U = { u1 = (1, 1), u2 = (5, 5), u3 = (9, 9) }
V = { v1 = (2, 2), v2 = (6, 6), v3 = (10, 10) }
Now we have to assign the points to the Wi,j. All we be empty, except:
W1,1 = (u1, v1): p0, p1
W2,2 = (u2, v2): p2, p3
W3,3 = (u3, v3): p4, p5
Note: We store the PQ-copressed version with every point ( for example
instead of storing just p0, store [p0, (u1, v1)] ), which will be used
during reranking.
_________________________________________________________________________
Assume T = 4 (which sets L = 2 for the sake of the example) and the query
is: (5, 5, 0, 0). Posing the query...
q^1 VS U
i u_α(i) r
1 u2 0
2 u1 32 <-- |d1( (5,5), (1,1) )|^2 = (5 - 1)^2 + (5 - 1)^2 = 32
q^2 VS V
j v_β(j) s
1 v1 8
2 v2 72
Invoke the Multi Sequence Algorithm which will output:
W2,1 --> [u2 v1] --> empty
W1,1 --> [u1 v1] --> p0, p1
W2,2 --> [u2 v2] --> p4, p5
// stop when the points that lie in the Ws returned so far are >= T
So the candidate list is {p0, p1, p2, p3}
_________________________________________________________________________
Rerank the candidate list, by computing the distance between the query and
the PQ-compressed representation (sum the distances of the subvectors).
请问正确吗?
是的,这似乎是正确的,正如 Babenko 本人在电子邮件对话中提到的那样。
我试图从这个 paper, which has also a smaller version here 中理解倒置多指数。为此,我构建了一个玩具示例并希望有人验证 or/and 与我分享 his/her 意见。
例子:
Assume we have N = 6 points in M = 4 dimensions. We use two blocks to
create sub-vecrtors. Let the points be these:
p0 = (0, 0, 1, 1)
p1 = (2, 2, 3, 3)
p2 = (4, 4, 5, 5)
p3 = (6, 6, 7, 7)
p4 = (8, 8, 9, 9)
p5 = (10, 10, 11, 11) // p5^1 = (10, 10), which is appended in D^1 etc.
_________________________________________________________________________
We run k-means twice, once for D^1 and once for D^2, by requiring 3
centroids and we get:
U = { u1 = (1, 1), u2 = (5, 5), u3 = (9, 9) }
V = { v1 = (2, 2), v2 = (6, 6), v3 = (10, 10) }
Now we have to assign the points to the Wi,j. All we be empty, except:
W1,1 = (u1, v1): p0, p1
W2,2 = (u2, v2): p2, p3
W3,3 = (u3, v3): p4, p5
Note: We store the PQ-copressed version with every point ( for example
instead of storing just p0, store [p0, (u1, v1)] ), which will be used
during reranking.
_________________________________________________________________________
Assume T = 4 (which sets L = 2 for the sake of the example) and the query
is: (5, 5, 0, 0). Posing the query...
q^1 VS U
i u_α(i) r
1 u2 0
2 u1 32 <-- |d1( (5,5), (1,1) )|^2 = (5 - 1)^2 + (5 - 1)^2 = 32
q^2 VS V
j v_β(j) s
1 v1 8
2 v2 72
Invoke the Multi Sequence Algorithm which will output:
W2,1 --> [u2 v1] --> empty
W1,1 --> [u1 v1] --> p0, p1
W2,2 --> [u2 v2] --> p4, p5
// stop when the points that lie in the Ws returned so far are >= T
So the candidate list is {p0, p1, p2, p3}
_________________________________________________________________________
Rerank the candidate list, by computing the distance between the query and
the PQ-compressed representation (sum the distances of the subvectors).
请问正确吗?
是的,这似乎是正确的,正如 Babenko 本人在电子邮件对话中提到的那样。