哪个字符附加到后缀数组中的字符串？

Question

我正在解决

https://www.spoj.com/problems/BEADS/

上述问题在 SPOJ。我已经在下面陈述了相关信息：

Problem Statement: The description of the necklace is a string A = a1a2 ... am specifying sizes of the particular beads, where the last character am is considered to precede character a1 in circular fashion. The disjoint point i is said to be worse than the disjoint point j if and only if the string aiai+1 ... ana1 ... ai-1 is lexicografically smaller than the string ajaj+1 ... ana1 ... aj-1. String a1a2 ... an is lexicografically smaller than the string b1b2 ... bn if and only if there exists an integer i, i <= n, so that aj=bj, for each j, 1 <= j < i and ai < bi.

Output: For each test case, print exactly one line containing only one integer -- number of the bead which is the first at the worst possible disjoining, i.e. such i, that the string A[i] is lexicographically smallest among all the n possible disjoinings of a necklace. If there are more than one solution, print the one with the lowest i.

现在解决方案是使用 SUFFIX ARRAY。输入字符串 s，并与其自身连接，s'=s+s，因为我必须对数组的循环后缀进行排序。然后在s'上创建一个后缀数组，输出最小的指向原始s后缀的索引，即index < len(s).

但是我遇到了一个问题。我在附加 '$' 字符以获得 SA，但我得到了错误的答案。在线查看后，我发现了 1 个将 '}' 附加到字符串的解决方案。我发现 ascii('$') < ascii('a') < ascii('z') < ascii('}')

但我不明白这会产生什么影响，为什么这是公认的答案并且还没有发现这会产生影响的情况。解决方案 (AC) 可以在这里找到：

Link to Code

    #include <bits/stdc++.h>
using namespace std;

string s;int n;
bool cmp_init(int a, int b)
{
    return s[a]<s[b] || (s[a]==s[b] && a<b);
}

int jmp;
vector<int> pos;
bool cmp(int a, int b)
{
    return pos[a]<pos[b] || (pos[a]==pos[b] && pos[(a+jmp)%n]<pos[(b+jmp)%n]);
}

int main() {
    int tc;cin>>tc;
    while(tc--){
    cin>>s;
    int m=s.size();
    s=s+s+"{";
    n=s.size();
    
    vector<int> SA(n,0);
    for(int i=0;i<n;i++)SA[i]=i;
    sort(SA.begin(), SA.end(), cmp_init);
    pos.assign(n,0);
    
    for(int i=1 , c=0;i<n;i++)pos[SA[i]]=(s[SA[i]]==s[SA[i-1]])?c:++c;
        
    for(jmp=1;jmp<=n;jmp*=2)
    {
        
        sort(SA.begin(), SA.end(), cmp);
        
        vector<int> tmp(n,0);
    
        for(int i=1 , c=0;i<n;i++)tmp[SA[i]]=(pos[SA[i]]==pos[SA[i-1]] && pos[(SA[i]+jmp)%n]==pos[(SA[i-1]+jmp)%n])?c:++c;

        for(int i=0;i<n;i++)pos[i]=tmp[i];

    }
    
        for(int i=0;i<n;i++)if(SA[i]<m){cout<<SA[i]+1<<"\n";break;}
    }
    
}

PS.: 我发现 SA 构造代码是正确的，唯一的问题是最后一个字符附加。通常我们在 SA 构造中附加 '$'。

Answer 1

不同的是最后一个条件：

If there are more than one solution, print the one with the lowest i.

考虑输入 "abab"。

正确答案是 0，当您附加 '}' 时会得到它，因为 "abababab}" 小于它的所有后缀。

如果您附加“$”，您会得到错误的答案，因为 "ab$" < "abab$" < "ababab$" < "abababab$".

哪个字符附加到后缀数组中的字符串？

Which character to append to string in suffix array?

string

algorithm

string-algorithm

suffix-array