如何以最快的方式对大字符串进行许多小的更改。视觉C++
How to do many small changes in big string in fastest way. Visual C++
我试图在一个月内解决这个问题。我需要在一个大字符串 (String ^) 中进行多次替换(超过 1000 万次)。我也需要快速完成。我的方法是正确的,但程序 运行 超过 30 分钟。
问题:
我有 table 项更改要做:[strWas1, strWillBe1, strWas2, strWillBe2, ..., strWas10^7, strWillBe10^7]
。我还有一个大字符串,它可以包含一些 strWasN
,但它也可以包含 something-elsestrWas1
,我不想更改它,因为“something-elsestrWas1
”不是“[=16=” ]".
例如字符串是:
"I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS,
Dogs, DoGs, 33DoGs00"
现在我需要将所有孤立的 "dogs" 从字母("dogs" 是 strWas1)更改为 "cats"("cats" 是 strWillBe1)。结果应该是:
"I have two cats, three notdogs, also dogsikong, 5cats, -cats. cats,
cats, cats, 33cats00"
我最后一次尝试是:
array<String^>^ strArray = gcnew array<String^>(9999999);
strArray[0] = gcnew String("dogs");
strArray[1] = gcnew String("cats");
//...
strArray[9999998] = gcnew String("whatReplace");
strArray[9999999] = gcnew String("newText");
bool found = false;
int index;
bool doThis = true;
String ^ notAllowed = u8"aąbcćdeęfghijklłmnńoópqrsśtuvwxyzźżAĄBCĆDEĘFGHIJKLŁMNŃOÓPQRSŚTUVWXYZŹŻёйцукенгшщзхъфывапролджэячсмитьбюЁЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБЮ";
String ^ text = u8"I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00";
for (int i = 0; i < 9999999; i+=2) {
while (found = text->Contains(strArray[i])) {
index = text->IndexOf(strArray[i]);
MessageBox::Show(index.ToString());
doThis = true;
if (index == 0) {
for (int j = 0; j < notAllowed->Length; j++) {
if (text->Substring(strArray[i]->Length, 1) == notAllowed->Substring(j, 1)) doThis = false;
}
}
else if (text->Length - index - strArray[i]->Length) {
for (int j = 0; j < notAllowed->Length; j++) {
if (text->Substring(index-1, 1) == notAllowed->Substring(j, 1)) doThis = false;
}
}
else {
for (int j = 0; j < notAllowed->Length; j++) {
if ((text->Substring(index - 1, 1) == notAllowed->Substring(j, 1)) || (text->Substring(index+strArray[i]->Length,1)== notAllowed->Substring(j, 1))) doThis = false;
}
}
if (doThis) {
text = text->Substring(0, index) + strArray[i + 1] + text->Substring(index + strArray[i]->Length, text->Length - index - strArray[i]->Length);
}
}
}
但这是无休止的工作
新版本(感谢 Vlad Feinstein):
array<String^>^ strArray = gcnew array<String^>(10);
strArray[0] = gcnew String("dogs");
strArray[1] = gcnew String("cats");
strArray[2] = gcnew String("dogs");
strArray[3] = gcnew String("cats");
strArray[4] = gcnew String("dogs");
strArray[5] = gcnew String("cats");
strArray[6] = gcnew String("dogs");
strArray[7] = gcnew String("cats");
strArray[8] = gcnew String("dogs");
strArray[9] = gcnew String("cats");
bool found = false;
int index;
bool doThis = true;
String ^ text = u8"I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00";
for (int i = 0; i < 10; i += 2)
{
int index = 0;
while ((index = text->ToLower()->IndexOf(strArray[i]->ToLower(), index)) != -1)
{
doThis = true;
// is there one more char?
if (index + strArray[i]->Length < text->Length)
{
if (Char::IsLetter(text[index+strArray[i]->Length]))
doThis = false;
}
// is there previous char?
if (index > 0)
{
if (Char::IsLetter(text[index - 1]))
doThis = false;
}
if (doThis)
text = text->Substring(0, index) + strArray[i + 1] +
text->Substring(index + strArray[i]->Length);
Debug::WriteLine(text);
index++;
}
}
当然还没有这么快的版本。快速版本写了 David Yaw。
你的代码中有很多问题可能会导致问题,但主要的逻辑错误是:
while (found = text->Contains(strArray[i]))
应该是
while (found == text->Contains(strArray[i]))
因为 ==
是比较运算符,而 =
是赋值运算符。所以你总是在分配,因此你的 while 循环在无限循环中。
嗯...不是吗?
while (found == text->Contains(strArray[i]))
用于比较。但是我之前没有计算 found
。所以我计算 found in while 并检查它是否为真。这是允许的。
while (found = text->Contains(strArray[i]))
正是:
found = text->Contains(strArray[i])
while (found==true)
至少在普通的 C++ 中它是有效的。在这里我也没有问题。
有一种更好的方法可以做到这一点,而不是盲目地检查一百万个替换字符串中的每一个。让 .Net 散列字符串,并让它以这种方式进行检查。
如果我们收到作为字典的查找和替换字符串,我们可以使用 .Net 的散列查找来查找我们需要替换的字符串。
如果我们遍历字符串中的每个字符,它可能是 5 个字符 'search for' 字符串的开头,或 4 个字符 'search for' 字符串的开头,等等,或者它可能根本不是 'search for' 字符串的一部分,在这种情况下,它会被直接复制到输出中。如果我们确实找到了 'search for' 字符串,我们会将替换写入输出,并将适当数量的输入字符标记为已消耗。
根据您的描述,您似乎希望在搜索字符串时进行不区分大小写的比较。您可以区分大小写或不区分大小写,只需在构造 Dictionary
.
时指定您喜欢的任何内容
String^ BigFindReplace(
String^ originalString,
Dictionary<String^, String^>^ replacementPairs)
{
// First, get the lengths of all the 'search for' strings in the replacement pairs.
SortedSet<int> searchForLengths;
for each (String^ searchFor in replacementPairs->Keys)
{
searchForLengths.Add(searchFor->Length);
}
// Searching for an empty string isn't valid: remove length zero, if it's there.
searchForLengths.Remove(0);
StringBuilder result;
// Step through the input string. For each character:
// A) See if the character is the beginning of one of the 'search for' strings.
// If so, then insert the 'replace with' string into the output buffer.
// Skip over this character and the rest of the 'search for' string that we found.
// B) If it's not the beginning of a 'search for' string, copy it to the output buffer.
for(int i = 0; i < originalString->Length; i++)
{
bool foundSomething = false;
int foundSomethingLength = 0;
for each (int len in searchForLengths.Reverse())
{
if (i > (originalString->Length - len))
{
// If we're on the last 4 characters of the string, we can ignore
// all the 'search for' strings that are 5 characters or longer.
continue;
}
String^ substr = originalString->Substring(i, len);
String^ replaceWith;
if (replacementPairs->TryGetValue(substr, replaceWith))
{
// We found the section of the input string that we're looking at in our
// 'search for' list! Inser the 'replace with' into the output buffer.
result.Append(replaceWith);
foundSomething = true;
foundSomethingLength = len;
break; // don't try to find more 'search for' strings.
}
}
if(foundSomething)
{
// We found & already inserted the replacement text. Just increment
// the loop counter to skip over the rest of the characters of the
// found 'search for' text.
i += (foundSomethingLength - 1); // "-1" because the for loop has its own "+1".
}
else
{
// We didn't find any of the 'search for' strings,
// so this is a character that just gets copied.
result.Append(originalString[i]);
}
}
return result.ToString();
}
我的测试应用:
int main(array<System::String ^> ^args)
{
String^ text = "I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00";
Dictionary<String^, String^>^ replacementPairs =
gcnew Dictionary<String^, String^>(StringComparer::CurrentCultureIgnoreCase);
replacementPairs->Add("dogs", "cats");
replacementPairs->Add("pigs", "cats");
replacementPairs->Add("mice", "cats");
replacementPairs->Add("rats", "cats");
replacementPairs->Add("horses", "cats");
String^ outText = BigFindReplace(text, replacementPairs);
Debug::WriteLine(outText);
String^ text2 = "I have two dogs, three notpigs, also miceikong, 5rats, -dogs. RATS, Horses, DoGs, 33DoGs00";
String^ outText2 = BigFindReplace(text, replacementPairs);
Debug::WriteLine(outText2);
return 0;
}
输出:
I have two cats, three notcats, also catsikong, 5cats, -cats. cats, cats, cats, 33cats00
I have two cats, three notcats, also catsikong, 5cats, -cats. cats, cats, cats, 33cats00
编辑:仅完整单词
好的,所以我们只需要替换整个单词。为此,我编写了一个辅助方法来将字符串拆分为单词和非单词。 (这与内置的 String::Split 方法不同:String::Split 没有 return 分隔符,我们在这里需要它们。)
一旦我们有了一个字符串数组,其中每个字符串要么是一个单词,要么是一堆非单词字符(例如,分隔符、空格等),然后我们可以 运行 每个字符串通过字典。因为我们一次处理一个完整的单词,而不是一次只处理一个字母,这样效率更高。
array<String^>^ SplitIntoWords(String^ input)
{
List<String^> result;
StringBuilder currentWord;
bool currentIsWord = false;
for each (System::Char c in input)
{
// Words are made up of letters. Word separators are made up of
// everything else (numbers, whitespace, punctuation, etc.)
bool nextCharIsWord = Char::IsLetter(c);
if(nextCharIsWord != currentIsWord)
{
if(currentWord.Length > 0)
{
result.Add(currentWord.ToString());
currentWord.Clear();
}
currentIsWord = nextCharIsWord;
}
currentWord.Append(c);
}
if(currentWord.Length > 0)
{
result.Add(currentWord.ToString());
currentWord.Clear();
}
return result.ToArray();
}
String^ BigFindReplaceWords(
String^ originalString,
Dictionary<String^, String^>^ replacementPairs)
{
StringBuilder result;
// First, separate the input string into an array of words & non-words.
array<String^>^ asWords = SplitIntoWords(originalString);
// Go through each word & non-word that came out of the split. If a word or
// non-word is in the replacement list, add the replacement to the output.
// Otherwise, add the word/nonword to the output.
for each (String^ word in asWords)
{
String^ replaceWith;
if (replacementPairs->TryGetValue(word, replaceWith))
{
result.Append(replaceWith);
}
else
{
result.Append(word);
}
}
return result.ToString();
}
我的测试应用:
int main(array<System::String ^> ^args)
{
String^ text = "I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00";
array<String^>^ words = SplitIntoWords(text);
for (int i = 0; i < words->Length; i++)
{
Debug::WriteLine("words[{0}] = '{1}'", i, words[i]);
}
Dictionary<String^, String^>^ replacementPairs =
gcnew Dictionary<String^, String^>(StringComparer::CurrentCultureIgnoreCase);
replacementPairs->Add("dogs", "cats");
replacementPairs->Add("pigs", "cats");
replacementPairs->Add("mice", "cats");
replacementPairs->Add("rats", "cats");
replacementPairs->Add("horses", "cats");
String^ outText = BigFindReplaceWords(text, replacementPairs);
Debug::WriteLine(outText);
String^ text2 = "I have two dogs, three notpigs, also miceikong, 5rats, -dogs. RATS, Horses, DoGs, 33DoGs00";
String^ outText2 = BigFindReplaceWords(text2, replacementPairs);
Debug::WriteLine(outText2);
return 0;
}
结果:
words[0] = 'I'
words[1] = ' '
words[2] = 'have'
words[3] = ' '
words[4] = 'two'
words[5] = ' '
words[6] = 'dogs'
words[7] = ', '
words[8] = 'three'
words[9] = ' '
words[10] = 'notdogs'
words[11] = ', '
words[12] = 'also'
words[13] = ' '
words[14] = 'dogsikong'
words[15] = ', 5'
words[16] = 'dogs'
words[17] = ', -'
words[18] = 'dogs'
words[19] = '. '
words[20] = 'DOGS'
words[21] = ', '
words[22] = 'Dogs'
words[23] = ', '
words[24] = 'DoGs'
words[25] = ', 33'
words[26] = 'DoGs'
words[27] = '00'
I have two cats, three notdogs, also dogsikong, 5cats, -cats. cats, cats, cats, 33cats00
I have two cats, three notpigs, also miceikong, 5cats, -cats. cats, cats, cats, 33cats00
Пётр Васильевич,一些建议:
- 将 Substring(x, 1) 替换为 Char[x]。
- 扔掉你的
notAllowed
字符串并使用 .NET 的 Char.IsLetter Method,或者至少在你设置 doThis = false;
时打破你的 for()
循环
- 如果需要从
index
到字符串末尾的子串,不需要计算长度;只需使用一个带有一个参数的表单:public string Substring(int startIndex)
- 不要使用
text->Contains()
;无论如何你都需要调用 text->IndexOf()
,只需将该索引与 -1
. 进行比较
- 1000万字???英语和俄语加起来没有那么多!
使用String.IndexOf Method (Char, Int32)的双参数形式指定从哪里开始查找(从上次查找到的词的位置开始),避免字符串开头重复查找超过。类似的东西:
for (int i = 0; i < 9999999; i += 2)
{
int index = 0;
while ((index = text->IndexOf(strArray[i], index)) != -1)
{
doThis = true;
// is there one more char?
if (index + strArray[i]->Length < text->Length)
{
if(Char.IsLetter(text->Char[strArray[i]->Length]))
doThis = false;
}
// is there previous char?
if (index > 0)
{
if (Char.IsLetter(text->Char[index - 1]))
doThis = false;
}
if (doThis)
text = text->Substring(0, index) + strArray[i + 1] +
text->Substring(index + strArray[i]->Length);
}
}
在你的 while()
循环中,将找到的字符串的索引收集到一个数组中,然后一次完成所有相同单词的替换。如果同一个词在 text
.
中多次出现,这将特别有用
我试图在一个月内解决这个问题。我需要在一个大字符串 (String ^) 中进行多次替换(超过 1000 万次)。我也需要快速完成。我的方法是正确的,但程序 运行 超过 30 分钟。
问题:
我有 table 项更改要做:[strWas1, strWillBe1, strWas2, strWillBe2, ..., strWas10^7, strWillBe10^7]
。我还有一个大字符串,它可以包含一些 strWasN
,但它也可以包含 something-elsestrWas1
,我不想更改它,因为“something-elsestrWas1
”不是“[=16=” ]".
例如字符串是:
"I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00"
现在我需要将所有孤立的 "dogs" 从字母("dogs" 是 strWas1)更改为 "cats"("cats" 是 strWillBe1)。结果应该是:
"I have two cats, three notdogs, also dogsikong, 5cats, -cats. cats, cats, cats, 33cats00"
我最后一次尝试是:
array<String^>^ strArray = gcnew array<String^>(9999999);
strArray[0] = gcnew String("dogs");
strArray[1] = gcnew String("cats");
//...
strArray[9999998] = gcnew String("whatReplace");
strArray[9999999] = gcnew String("newText");
bool found = false;
int index;
bool doThis = true;
String ^ notAllowed = u8"aąbcćdeęfghijklłmnńoópqrsśtuvwxyzźżAĄBCĆDEĘFGHIJKLŁMNŃOÓPQRSŚTUVWXYZŹŻёйцукенгшщзхъфывапролджэячсмитьбюЁЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБЮ";
String ^ text = u8"I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00";
for (int i = 0; i < 9999999; i+=2) {
while (found = text->Contains(strArray[i])) {
index = text->IndexOf(strArray[i]);
MessageBox::Show(index.ToString());
doThis = true;
if (index == 0) {
for (int j = 0; j < notAllowed->Length; j++) {
if (text->Substring(strArray[i]->Length, 1) == notAllowed->Substring(j, 1)) doThis = false;
}
}
else if (text->Length - index - strArray[i]->Length) {
for (int j = 0; j < notAllowed->Length; j++) {
if (text->Substring(index-1, 1) == notAllowed->Substring(j, 1)) doThis = false;
}
}
else {
for (int j = 0; j < notAllowed->Length; j++) {
if ((text->Substring(index - 1, 1) == notAllowed->Substring(j, 1)) || (text->Substring(index+strArray[i]->Length,1)== notAllowed->Substring(j, 1))) doThis = false;
}
}
if (doThis) {
text = text->Substring(0, index) + strArray[i + 1] + text->Substring(index + strArray[i]->Length, text->Length - index - strArray[i]->Length);
}
}
}
但这是无休止的工作
新版本(感谢 Vlad Feinstein):
array<String^>^ strArray = gcnew array<String^>(10);
strArray[0] = gcnew String("dogs");
strArray[1] = gcnew String("cats");
strArray[2] = gcnew String("dogs");
strArray[3] = gcnew String("cats");
strArray[4] = gcnew String("dogs");
strArray[5] = gcnew String("cats");
strArray[6] = gcnew String("dogs");
strArray[7] = gcnew String("cats");
strArray[8] = gcnew String("dogs");
strArray[9] = gcnew String("cats");
bool found = false;
int index;
bool doThis = true;
String ^ text = u8"I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00";
for (int i = 0; i < 10; i += 2)
{
int index = 0;
while ((index = text->ToLower()->IndexOf(strArray[i]->ToLower(), index)) != -1)
{
doThis = true;
// is there one more char?
if (index + strArray[i]->Length < text->Length)
{
if (Char::IsLetter(text[index+strArray[i]->Length]))
doThis = false;
}
// is there previous char?
if (index > 0)
{
if (Char::IsLetter(text[index - 1]))
doThis = false;
}
if (doThis)
text = text->Substring(0, index) + strArray[i + 1] +
text->Substring(index + strArray[i]->Length);
Debug::WriteLine(text);
index++;
}
}
当然还没有这么快的版本。快速版本写了 David Yaw。
你的代码中有很多问题可能会导致问题,但主要的逻辑错误是:
while (found = text->Contains(strArray[i]))
应该是
while (found == text->Contains(strArray[i]))
因为 ==
是比较运算符,而 =
是赋值运算符。所以你总是在分配,因此你的 while 循环在无限循环中。
嗯...不是吗?
while (found == text->Contains(strArray[i]))
用于比较。但是我之前没有计算 found
。所以我计算 found in while 并检查它是否为真。这是允许的。
while (found = text->Contains(strArray[i]))
正是:
found = text->Contains(strArray[i])
while (found==true)
至少在普通的 C++ 中它是有效的。在这里我也没有问题。
有一种更好的方法可以做到这一点,而不是盲目地检查一百万个替换字符串中的每一个。让 .Net 散列字符串,并让它以这种方式进行检查。
如果我们收到作为字典的查找和替换字符串,我们可以使用 .Net 的散列查找来查找我们需要替换的字符串。
如果我们遍历字符串中的每个字符,它可能是 5 个字符 'search for' 字符串的开头,或 4 个字符 'search for' 字符串的开头,等等,或者它可能根本不是 'search for' 字符串的一部分,在这种情况下,它会被直接复制到输出中。如果我们确实找到了 'search for' 字符串,我们会将替换写入输出,并将适当数量的输入字符标记为已消耗。
根据您的描述,您似乎希望在搜索字符串时进行不区分大小写的比较。您可以区分大小写或不区分大小写,只需在构造 Dictionary
.
String^ BigFindReplace(
String^ originalString,
Dictionary<String^, String^>^ replacementPairs)
{
// First, get the lengths of all the 'search for' strings in the replacement pairs.
SortedSet<int> searchForLengths;
for each (String^ searchFor in replacementPairs->Keys)
{
searchForLengths.Add(searchFor->Length);
}
// Searching for an empty string isn't valid: remove length zero, if it's there.
searchForLengths.Remove(0);
StringBuilder result;
// Step through the input string. For each character:
// A) See if the character is the beginning of one of the 'search for' strings.
// If so, then insert the 'replace with' string into the output buffer.
// Skip over this character and the rest of the 'search for' string that we found.
// B) If it's not the beginning of a 'search for' string, copy it to the output buffer.
for(int i = 0; i < originalString->Length; i++)
{
bool foundSomething = false;
int foundSomethingLength = 0;
for each (int len in searchForLengths.Reverse())
{
if (i > (originalString->Length - len))
{
// If we're on the last 4 characters of the string, we can ignore
// all the 'search for' strings that are 5 characters or longer.
continue;
}
String^ substr = originalString->Substring(i, len);
String^ replaceWith;
if (replacementPairs->TryGetValue(substr, replaceWith))
{
// We found the section of the input string that we're looking at in our
// 'search for' list! Inser the 'replace with' into the output buffer.
result.Append(replaceWith);
foundSomething = true;
foundSomethingLength = len;
break; // don't try to find more 'search for' strings.
}
}
if(foundSomething)
{
// We found & already inserted the replacement text. Just increment
// the loop counter to skip over the rest of the characters of the
// found 'search for' text.
i += (foundSomethingLength - 1); // "-1" because the for loop has its own "+1".
}
else
{
// We didn't find any of the 'search for' strings,
// so this is a character that just gets copied.
result.Append(originalString[i]);
}
}
return result.ToString();
}
我的测试应用:
int main(array<System::String ^> ^args)
{
String^ text = "I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00";
Dictionary<String^, String^>^ replacementPairs =
gcnew Dictionary<String^, String^>(StringComparer::CurrentCultureIgnoreCase);
replacementPairs->Add("dogs", "cats");
replacementPairs->Add("pigs", "cats");
replacementPairs->Add("mice", "cats");
replacementPairs->Add("rats", "cats");
replacementPairs->Add("horses", "cats");
String^ outText = BigFindReplace(text, replacementPairs);
Debug::WriteLine(outText);
String^ text2 = "I have two dogs, three notpigs, also miceikong, 5rats, -dogs. RATS, Horses, DoGs, 33DoGs00";
String^ outText2 = BigFindReplace(text, replacementPairs);
Debug::WriteLine(outText2);
return 0;
}
输出:
I have two cats, three notcats, also catsikong, 5cats, -cats. cats, cats, cats, 33cats00 I have two cats, three notcats, also catsikong, 5cats, -cats. cats, cats, cats, 33cats00
编辑:仅完整单词
好的,所以我们只需要替换整个单词。为此,我编写了一个辅助方法来将字符串拆分为单词和非单词。 (这与内置的 String::Split 方法不同:String::Split 没有 return 分隔符,我们在这里需要它们。)
一旦我们有了一个字符串数组,其中每个字符串要么是一个单词,要么是一堆非单词字符(例如,分隔符、空格等),然后我们可以 运行 每个字符串通过字典。因为我们一次处理一个完整的单词,而不是一次只处理一个字母,这样效率更高。
array<String^>^ SplitIntoWords(String^ input)
{
List<String^> result;
StringBuilder currentWord;
bool currentIsWord = false;
for each (System::Char c in input)
{
// Words are made up of letters. Word separators are made up of
// everything else (numbers, whitespace, punctuation, etc.)
bool nextCharIsWord = Char::IsLetter(c);
if(nextCharIsWord != currentIsWord)
{
if(currentWord.Length > 0)
{
result.Add(currentWord.ToString());
currentWord.Clear();
}
currentIsWord = nextCharIsWord;
}
currentWord.Append(c);
}
if(currentWord.Length > 0)
{
result.Add(currentWord.ToString());
currentWord.Clear();
}
return result.ToArray();
}
String^ BigFindReplaceWords(
String^ originalString,
Dictionary<String^, String^>^ replacementPairs)
{
StringBuilder result;
// First, separate the input string into an array of words & non-words.
array<String^>^ asWords = SplitIntoWords(originalString);
// Go through each word & non-word that came out of the split. If a word or
// non-word is in the replacement list, add the replacement to the output.
// Otherwise, add the word/nonword to the output.
for each (String^ word in asWords)
{
String^ replaceWith;
if (replacementPairs->TryGetValue(word, replaceWith))
{
result.Append(replaceWith);
}
else
{
result.Append(word);
}
}
return result.ToString();
}
我的测试应用:
int main(array<System::String ^> ^args)
{
String^ text = "I have two dogs, three notdogs, also dogsikong, 5dogs, -dogs. DOGS, Dogs, DoGs, 33DoGs00";
array<String^>^ words = SplitIntoWords(text);
for (int i = 0; i < words->Length; i++)
{
Debug::WriteLine("words[{0}] = '{1}'", i, words[i]);
}
Dictionary<String^, String^>^ replacementPairs =
gcnew Dictionary<String^, String^>(StringComparer::CurrentCultureIgnoreCase);
replacementPairs->Add("dogs", "cats");
replacementPairs->Add("pigs", "cats");
replacementPairs->Add("mice", "cats");
replacementPairs->Add("rats", "cats");
replacementPairs->Add("horses", "cats");
String^ outText = BigFindReplaceWords(text, replacementPairs);
Debug::WriteLine(outText);
String^ text2 = "I have two dogs, three notpigs, also miceikong, 5rats, -dogs. RATS, Horses, DoGs, 33DoGs00";
String^ outText2 = BigFindReplaceWords(text2, replacementPairs);
Debug::WriteLine(outText2);
return 0;
}
结果:
words[0] = 'I' words[1] = ' ' words[2] = 'have' words[3] = ' ' words[4] = 'two' words[5] = ' ' words[6] = 'dogs' words[7] = ', ' words[8] = 'three' words[9] = ' ' words[10] = 'notdogs' words[11] = ', ' words[12] = 'also' words[13] = ' ' words[14] = 'dogsikong' words[15] = ', 5' words[16] = 'dogs' words[17] = ', -' words[18] = 'dogs' words[19] = '. ' words[20] = 'DOGS' words[21] = ', ' words[22] = 'Dogs' words[23] = ', ' words[24] = 'DoGs' words[25] = ', 33' words[26] = 'DoGs' words[27] = '00' I have two cats, three notdogs, also dogsikong, 5cats, -cats. cats, cats, cats, 33cats00 I have two cats, three notpigs, also miceikong, 5cats, -cats. cats, cats, cats, 33cats00
Пётр Васильевич,一些建议:
- 将 Substring(x, 1) 替换为 Char[x]。
- 扔掉你的
notAllowed
字符串并使用 .NET 的 Char.IsLetter Method,或者至少在你设置doThis = false;
时打破你的 - 如果需要从
index
到字符串末尾的子串,不需要计算长度;只需使用一个带有一个参数的表单:public string Substring(int startIndex) - 不要使用
text->Contains()
;无论如何你都需要调用text->IndexOf()
,只需将该索引与-1
. 进行比较
- 1000万字???英语和俄语加起来没有那么多!
使用String.IndexOf Method (Char, Int32)的双参数形式指定从哪里开始查找(从上次查找到的词的位置开始),避免字符串开头重复查找超过。类似的东西:
for (int i = 0; i < 9999999; i += 2) { int index = 0; while ((index = text->IndexOf(strArray[i], index)) != -1) { doThis = true; // is there one more char? if (index + strArray[i]->Length < text->Length) { if(Char.IsLetter(text->Char[strArray[i]->Length])) doThis = false; } // is there previous char? if (index > 0) { if (Char.IsLetter(text->Char[index - 1])) doThis = false; } if (doThis) text = text->Substring(0, index) + strArray[i + 1] + text->Substring(index + strArray[i]->Length); } }
在你的
while()
循环中,将找到的字符串的索引收集到一个数组中,然后一次完成所有相同单词的替换。如果同一个词在text
. 中多次出现,这将特别有用
for()
循环