哪个字符附加到后缀数组中的字符串?
Which character to append to string in suffix array?
我正在解决
https://www.spoj.com/problems/BEADS/
上述问题在 SPOJ。我已经在下面陈述了相关信息:
Problem Statement:
The description of the necklace is a string A = a1a2 ... am specifying sizes of the particular beads, where the last character am is considered to precede character a1 in circular fashion.
The disjoint point i is said to be worse than the disjoint point j if and only if the string aiai+1 ... ana1 ... ai-1 is lexicografically smaller than the string ajaj+1 ... ana1 ... aj-1. String a1a2 ... an is lexicografically smaller than the string b1b2 ... bn if and only if there exists an integer i, i <= n, so that aj=bj, for each j, 1 <= j < i and ai < bi.
Output:
For each test case, print exactly one line containing only one integer -- number of the bead which is the first at the worst possible disjoining, i.e. such i, that the string A[i] is lexicographically smallest among all the n possible disjoinings of a necklace. If there are more than one solution, print the one with the lowest i.
现在解决方案是使用 SUFFIX ARRAY。输入字符串 s,并与其自身连接,s'=s+s,因为我必须对数组的循环后缀进行排序。然后在s'上创建一个后缀数组,输出最小的指向原始s后缀的索引,即index < len(s).
但是我遇到了一个问题。我在附加 '$' 字符以获得 SA,但我得到了错误的答案。在线查看后,我发现了 1 个将 '}' 附加到字符串的解决方案。
我发现 ascii('$') < ascii('a') < ascii('z') < ascii('}')
但我不明白这会产生什么影响,为什么这是公认的答案并且还没有发现这会产生影响的情况。解决方案 (AC) 可以在这里找到:
#include <bits/stdc++.h>
using namespace std;
string s;int n;
bool cmp_init(int a, int b)
{
return s[a]<s[b] || (s[a]==s[b] && a<b);
}
int jmp;
vector<int> pos;
bool cmp(int a, int b)
{
return pos[a]<pos[b] || (pos[a]==pos[b] && pos[(a+jmp)%n]<pos[(b+jmp)%n]);
}
int main() {
int tc;cin>>tc;
while(tc--){
cin>>s;
int m=s.size();
s=s+s+"{";
n=s.size();
vector<int> SA(n,0);
for(int i=0;i<n;i++)SA[i]=i;
sort(SA.begin(), SA.end(), cmp_init);
pos.assign(n,0);
for(int i=1 , c=0;i<n;i++)pos[SA[i]]=(s[SA[i]]==s[SA[i-1]])?c:++c;
for(jmp=1;jmp<=n;jmp*=2)
{
sort(SA.begin(), SA.end(), cmp);
vector<int> tmp(n,0);
for(int i=1 , c=0;i<n;i++)tmp[SA[i]]=(pos[SA[i]]==pos[SA[i-1]] && pos[(SA[i]+jmp)%n]==pos[(SA[i-1]+jmp)%n])?c:++c;
for(int i=0;i<n;i++)pos[i]=tmp[i];
}
for(int i=0;i<n;i++)if(SA[i]<m){cout<<SA[i]+1<<"\n";break;}
}
}
PS.: 我发现 SA 构造代码是正确的,唯一的问题是最后一个字符附加。通常我们在 SA 构造中附加 '$'。
不同的是最后一个条件:
If there are more than one solution, print the one with the lowest i.
考虑输入 "abab"
。
正确答案是 0,当您附加 '}' 时会得到它,因为 "abababab}"
小于它的所有后缀。
如果您附加“$”,您会得到错误的答案,因为 "ab$"
< "abab$"
< "ababab$"
< "abababab$"
.
我正在解决
https://www.spoj.com/problems/BEADS/
上述问题在 SPOJ。我已经在下面陈述了相关信息:
Problem Statement: The description of the necklace is a string A = a1a2 ... am specifying sizes of the particular beads, where the last character am is considered to precede character a1 in circular fashion. The disjoint point i is said to be worse than the disjoint point j if and only if the string aiai+1 ... ana1 ... ai-1 is lexicografically smaller than the string ajaj+1 ... ana1 ... aj-1. String a1a2 ... an is lexicografically smaller than the string b1b2 ... bn if and only if there exists an integer i, i <= n, so that aj=bj, for each j, 1 <= j < i and ai < bi.
Output: For each test case, print exactly one line containing only one integer -- number of the bead which is the first at the worst possible disjoining, i.e. such i, that the string A[i] is lexicographically smallest among all the n possible disjoinings of a necklace. If there are more than one solution, print the one with the lowest i.
现在解决方案是使用 SUFFIX ARRAY。输入字符串 s,并与其自身连接,s'=s+s,因为我必须对数组的循环后缀进行排序。然后在s'上创建一个后缀数组,输出最小的指向原始s后缀的索引,即index < len(s).
但是我遇到了一个问题。我在附加 '$' 字符以获得 SA,但我得到了错误的答案。在线查看后,我发现了 1 个将 '}' 附加到字符串的解决方案。 我发现 ascii('$') < ascii('a') < ascii('z') < ascii('}')
但我不明白这会产生什么影响,为什么这是公认的答案并且还没有发现这会产生影响的情况。解决方案 (AC) 可以在这里找到:
#include <bits/stdc++.h>
using namespace std;
string s;int n;
bool cmp_init(int a, int b)
{
return s[a]<s[b] || (s[a]==s[b] && a<b);
}
int jmp;
vector<int> pos;
bool cmp(int a, int b)
{
return pos[a]<pos[b] || (pos[a]==pos[b] && pos[(a+jmp)%n]<pos[(b+jmp)%n]);
}
int main() {
int tc;cin>>tc;
while(tc--){
cin>>s;
int m=s.size();
s=s+s+"{";
n=s.size();
vector<int> SA(n,0);
for(int i=0;i<n;i++)SA[i]=i;
sort(SA.begin(), SA.end(), cmp_init);
pos.assign(n,0);
for(int i=1 , c=0;i<n;i++)pos[SA[i]]=(s[SA[i]]==s[SA[i-1]])?c:++c;
for(jmp=1;jmp<=n;jmp*=2)
{
sort(SA.begin(), SA.end(), cmp);
vector<int> tmp(n,0);
for(int i=1 , c=0;i<n;i++)tmp[SA[i]]=(pos[SA[i]]==pos[SA[i-1]] && pos[(SA[i]+jmp)%n]==pos[(SA[i-1]+jmp)%n])?c:++c;
for(int i=0;i<n;i++)pos[i]=tmp[i];
}
for(int i=0;i<n;i++)if(SA[i]<m){cout<<SA[i]+1<<"\n";break;}
}
}
PS.: 我发现 SA 构造代码是正确的,唯一的问题是最后一个字符附加。通常我们在 SA 构造中附加 '$'。
不同的是最后一个条件:
If there are more than one solution, print the one with the lowest i.
考虑输入 "abab"
。
正确答案是 0,当您附加 '}' 时会得到它,因为 "abababab}"
小于它的所有后缀。
如果您附加“$”,您会得到错误的答案,因为 "ab$"
< "abab$"
< "ababab$"
< "abababab$"
.