在 C++ 中寻找搜索和替换的圣杯

Searching for Holy Grail of search and replace in C++

最近我一直在寻找一种方法来替换字符串中的标记,它本质上是查找和替换(但至少有一个额外的方法来解决这个问题)并且看起来是一个非常平庸的任务。我已经提出了几个可能的实现,但是 none 从性能的角度来看,它们令人满意。最好的成绩是每次迭代 ~50us。这种情况很理想,字符串的大小从未增长,最初我忽略了不区分大小写的要求
这是 Coliru
处的代码
我机器的结果:
Boost.Spirit 符号结果:3421?=3421
100000 个周期耗时 6060 毫秒。
博耶-摩尔 results:3421?=3421
100000 个周期耗时 5959 毫秒。
Boyer Moore Hospool result:3421?=3421
100000 次循环耗时 5008 毫秒。
Knuth Morris Pratt 结果:3421?=3421
100000 个周期耗时 12451 毫秒。
朴素的 STL 搜索和替换结果:3421?=3421
100000 个周期耗时 5532 毫秒。
提升replace_allresult:3421?=3421
100000 次循环耗时 4860 毫秒。

那么问题来了,为什么这么简单的任务需要这么长时间?可以说,好的,简单的任务,继续并更好地实施它。但现实情况是,15 年前的 MFC 天真实现可以更快地完成任务数量级:

CString FillTokenParams(const CString& input, const std::unordered_map<std::string, std::string>& tokens)
{
    CString tmpInput = input;
    for(const auto& token : tokens)
    {
        int pos = 0;
        while(pos != -1)
        {
            pos = tmpInput.Find(token.first.c_str(), pos);
            if(pos != -1)
            {
                int tokenLength = token.first.size();
                tmpInput.Delete(pos, tokenLength);
                tmpInput.Insert(pos, token.second.c_str());
                pos += 1;
            }
        }
    }

    return tmpInput;
}

结果:
MFC 简单搜索和替换 result:3421?=3421
100000 个周期耗时 516 毫秒。
为什么这个笨拙的代码能胜过现代 C++???为什么其他实施如此缓慢?我错过了一些基本的东西吗?

EDIT001:我已经研究了这个问题,代码已经过 profiled 和三重检查。你可以对这个那个不满意,但是 std::string::replace 不是花时间的。在任何 STL 实现中,搜索是花费大部分时间的事情,boost spirit 将时间浪费在 tst 的分配上(我猜是评估树中的测试节点)。我不希望任何人在带有 "this is your problem" 的函数中指向一行并且噗,问题就消失了。问题是 MFC 是如何设法以快 10 倍的速度完成同样的工作的。

EDIT002:刚刚深入研究 Find 的 MFC 实现并编写了一个模仿 MFC 实现的函数

namespace mfc
{
std::string::size_type Find(const std::string& input, const std::string& subString, std::string::size_type start)
{
    if(subString.empty())
    {
        return std::string::npos;
    }

    if(start < 0 || start > input.size())
    {
        return std::string::npos;
    }

    auto found = strstr(input.c_str() + start, subString.c_str());
    return ((found == nullptr) ? std::string::npos : std::string::size_type(found - input.c_str()));
}
}

std::string MFCMimicking(const std::string& input, const std::unordered_map<std::string, std::string>& tokens)
{
    auto tmpInput = input;
    for(const auto& token : tokens)
    {
        auto pos = 0;
        while(pos != std::string::npos)
        {
            pos = mfc::Find(tmpInput, token.first, pos);
            if(pos != std::string::npos)
            {
                auto tokenLength = token.first.size();
                tmpInput.replace(pos, tokenLength, token.second.c_str());
                pos += 1;
            }
        }
    }

    return tmpInput;
}

结果:
MFC 模仿展开 result:3421?=3421
100000 个周期耗时 411 毫秒。
意思是4us。每次通话,去击败那个 C strstr

EDIT003:编译和 运行 -Ox


MFC mimicking expand result:3421?=3421
100000 cycles took 660ms.
MFC naive search and replace result:3421?=3421
100000 cycles took 856ms.
Manual expand result:3421?=3421
100000 cycles took 1995ms.
Boyer-Moore results:3421?=3421
100000 cycles took 6911ms.
Boyer Moore Hospool result:3421?=3421
100000 cycles took 5670ms.
Knuth Morris Pratt result: 3421?=3421
100000 cycles took 13825ms.
Naive STL search and replace result: 3421?=3421
100000 cycles took 9531ms.
Boost replace_all result:3421?=3421
100000 cycles took 8996ms.


运行 -O2(与原始测量值一样)但 10k 个周期


MFC mimicking expand result:3421?=3421
10000 cycles took 104ms.
MFC naive search and replace result:3421?=3421
10000 cycles took 105ms.
Manual expand result:3421?=3421
10000 cycles took 356ms.
Boyer-Moore results:3421?=3421
10000 cycles took 1355ms.
Boyer Moore Hospool result:3421?=3421
10000 cycles took 1101ms.
Knuth Morris Pratt result: 3421?=3421
10000 cycles took 1973ms.
Naive STL search and replace result: 3421?=3421
10000 cycles took 923ms.
Boost replace_all result:3421?=3421
10000 cycles took 880ms.

您对 std::string:replace 的可疑使用是如此缓慢以至于代码中的其他任何内容都无关紧要。

MFC 模拟的 EDIT2: 从您的代码中可以明显看出为什么 MFC 版本更快:它不会像您的其他一些版本那样在每次替换时搜索整个字符串(我仍然无法解释 boost::spirit)。一旦执行替换,它就会从替换点开始搜索,而不是从字符串的开头开始搜索,所以很明显这样会更快。

编辑:在做了更多研究并看到 (Algorithm to find multiple string matches) 之后,似乎使用良好的单字符串匹配算法来查找多个搜索词才是这里的实际问题。最好的选择可能是使用适当的算法(该问题中提到了一些算法)。

至于为什么MFC更快?我建议将其提炼成一个不同的问题 "Why are delete and insert in CString so much faster than std::string" 或类似的问题,并确保将其标记为 C++ 和 MFC,以便具有适当专业知识的人可以提供帮助(我有标准 C++ 的经验,但不能帮助VC++ 对 CString 进行了哪些优化)。

原回答: 好的,由于代码量巨大,我只查看了 expandTokens3,但我认为所有版本都有相同的问题。您的代码有两个潜在的重大性能问题:

  • 每次替换时都会搜索整个字符串。如果您在一个字符串中有十个变量要替换,它将比需要的时间长十倍。

  • 您在输入字符串中就地执行每个替换,而不是从每个片段构建结果字符串。这可能会导致内存分配和复制 each 替换,再次可能显着增加运行时间。

所以,我对Qi版有一些观察。

还创建了一个X3版本。

最后,写了一个击败所有其他候选人的手动扩展函数(我希望它比 MFC 更快,因为它不会重复 deletes/inserts)。

如果需要,请跳至基准图表。

关于齐版

  1. 是的,符号表存在基于节点的容器的局部性问题。它们可能不是您可以在此处使用的最佳匹配项。
  2. 无需在每个循环中重建符号:
  3. 不是按字符跳过非符号,而是扫描到下一个:

    +(bsq::char_ - symbols)
    
inline std::string spirit_qi(const std::string& input, bsq::symbols<char, std::string> const& symbols)
{
    std::string retVal;
    retVal.reserve(input.size() * 2);

    auto beg = input.cbegin();
    auto end = input.cend();

    if(!bsq::parse(beg, end, *(symbols | +(bsq::char_ - symbols)), retVal))
        retVal = input;

    return retVal;
}

这已经相当快了。但是:

手动循环

在这个简单的示例中,您为什么不手动进行解析?

inline std::string manual_expand(const std::string& input, TokenMap const& tokens)
{
    std::ostringstream builder;
    auto expand = [&](auto const& key) {
        auto match = tokens.find(key);
        if (match == tokens.end())
            builder << "$(" << key << ")";
        else
            builder << match->second;
    };

    builder.str().reserve(input.size()*2);

    builder.str("");
    std::ostreambuf_iterator<char> out(builder);

    for(auto f(input.begin()), l(input.end()); f != l;) {
        switch(*f) {
            case '$' : {
                    if (++f==l || *f!='(') {
                        *out++ = '$';
                        break;
                    }
                    else {
                        auto s = ++f;
                        size_t n = 0;

                        while (f!=l && *f != ')')
                            ++f, ++n;

                        // key is [s,f] now
                        expand(std::string(&*s, &*s+n));

                        if (f!=l)
                            ++f; // skip '}'
                    }
                }
            default:
                *out++ = *f++;
        }
    }
    return builder.str();
}

这在我的机器上的性能要好得多。

其他想法

您可以查看 Boost Spirit Lex,可能使用静态生成的标记表:http://www.boost.org/doc/libs/1_60_0/libs/spirit/doc/html/spirit/lex/abstracts/lexer_static_model.html。不过我不是特别喜欢 Lex。

比较:

Interactive Chart

这是使用 Nonius 作为基准统计数据。

完整基准代码:http://paste.ubuntu.com/14133072/

#include <boost/container/flat_map.hpp>

#define USE_X3
#ifdef USE_X3
#   include <boost/spirit/home/x3.hpp>
#else
#   include <boost/spirit/include/qi.hpp>
#endif

#include <boost/algorithm/string.hpp>
#include <boost/algorithm/searching/boyer_moore.hpp>
#include <boost/algorithm/searching/boyer_moore_horspool.hpp>
#include <boost/algorithm/searching/knuth_morris_pratt.hpp>
#include <string>
#include <unordered_map>
#include <iostream>
#include <fstream>
#include <nonius/benchmark.h++>
#include <nonius/main.h++>

using TokenMap = boost::container::flat_map<std::string, std::string>;

#ifdef USE_X3
    namespace x3  = boost::spirit::x3;

    struct append {
        std::string& out;
        void do_append(char const ch) const                       { out += ch;                      } 
        void do_append(std::string const& s)  const               { out += s;                       } 
        template<typename It>
        void do_append(boost::iterator_range<It> const& r)  const { out.append(r.begin(), r.end()); } 
        template<typename Ctx>
        void operator()(Ctx& ctx) const                           { do_append(_attr(ctx));          } 
    };

    inline std::string spirit_x3(const std::string& input, x3::symbols<char const*> const& symbols)
    {
        std::string retVal;
        retVal.reserve(input.size() * 2);
        append appender { retVal };

        auto beg = input.cbegin();
        auto end = input.cend();

        auto rule = *(symbols[appender] | x3::char_ [appender]);

        if(!x3::parse(beg, end, rule))
            retVal = input;

        return retVal;
    }
#else
    namespace bsq = boost::spirit::qi;

    inline std::string spirit_qi_old(const std::string& input, TokenMap const& tokens)
    {
        std::string retVal;
        retVal.reserve(input.size() * 2);
        bsq::symbols<char const, char const*> symbols;
        for(const auto& token : tokens) {
            symbols.add(token.first.c_str(), token.second.c_str());
        }

        auto beg = input.cbegin();
        auto end = input.cend();

        if(!bsq::parse(beg, end, *(symbols | bsq::char_), retVal))
            retVal = input;

        return retVal;
    }

    inline std::string spirit_qi(const std::string& input, bsq::symbols<char, std::string> const& symbols)
    {
        std::string retVal;
        retVal.reserve(input.size() * 2);

        auto beg = input.cbegin();
        auto end = input.cend();

        if(!bsq::parse(beg, end, *(symbols | +(bsq::char_ - symbols)), retVal))
            retVal = input;

        return retVal;
    }
#endif

inline std::string manual_expand(const std::string& input, TokenMap const& tokens) {
    std::ostringstream builder;
    auto expand = [&](auto const& key) {
        auto match = tokens.find(key);

        if (match == tokens.end())
            builder << "$(" << key << ")";
        else
            builder << match->second;
    };

    builder.str().reserve(input.size()*2);
    std::ostreambuf_iterator<char> out(builder);

    for(auto f(input.begin()), l(input.end()); f != l;) {
        switch(*f) {
            case '$' : {
                    if (++f==l || *f!='(') {
                        *out++ = '$';
                        break;
                    }
                    else {
                        auto s = ++f;
                        size_t n = 0;

                        while (f!=l && *f != ')')
                            ++f, ++n;

                        // key is [s,f] now
                        expand(std::string(&*s, &*s+n));

                        if (f!=l)
                            ++f; // skip '}'
                    }
                }
            default:
                *out++ = *f++;
        }
    }
    return builder.str();
}

inline std::string boost_replace_all(const std::string& input, TokenMap const& tokens)
{
    std::string retVal(input);
    retVal.reserve(input.size() * 2);

    for(const auto& token : tokens)
    {
        boost::replace_all(retVal, token.first, token.second);
    }
    return retVal;
}

inline void naive_stl(std::string& input, TokenMap const& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next = std::search(input.cbegin(), input.cend(), token.first.begin(), token.first.end());
        while(next != input.cend())
        {
            input.replace(next, next + token.first.size(), token.second);
            next = std::search(input.cbegin(), input.cend(), token.first.begin(), token.first.end());
        }
    }
}

inline void boyer_more(std::string& input, TokenMap const& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next =
            boost::algorithm::boyer_moore_search(input.cbegin(), input.cend(), token.first.begin(), token.first.end());
        while(next != input.cend())
        {
            input.replace(next, next + token.first.size(), token.second);
            next = boost::algorithm::boyer_moore_search(input.cbegin(), input.cend(), token.first.begin(),
                                                        token.first.end());
        }
    }
}

inline void bmh_search(std::string& input, TokenMap const& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next = boost::algorithm::boyer_moore_horspool_search(input.cbegin(), input.cend(), token.first.begin(),
                                                                  token.first.end());
        while(next != input.cend())
        {
            input.replace(next, next + token.first.size(), token.second);
            next = boost::algorithm::boyer_moore_search(input.cbegin(), input.cend(), token.first.begin(),
                                                        token.first.end());
        }
    }
}

inline void kmp_search(std::string& input, TokenMap const& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next = boost::algorithm::knuth_morris_pratt_search(input.cbegin(), input.cend(), token.first.begin(),
                                                                token.first.end());
        while(next != input.cend())
        {
            input.replace(next, next + token.first.size(), token.second);
            next = boost::algorithm::boyer_moore_search(input.cbegin(), input.cend(), token.first.begin(),
                                                        token.first.end());
        }
    }
}

namespace testdata {
    std::string const expected =
        "Five and Seven said nothing, but looked at Two. Two began in a low voice, 'Why the fact is, you see, Miss, "
        "this here ought to have been a red rose-tree, and we put a white one in by mistake; and if the Queen was to "
        "find it out, we should all have our heads cut off, you know. So you see, Miss, we're doing our best, afore "
        "she comes, to—' At this moment Five, who had been anxiously looking across the garden, called out 'The Queen! "
        "The Queen!' and the three gardeners instantly threw themselves flat upon their faces. There was a sound of "
        "many footsteps, and Alice looked round, eager to see the Queen.First came ten soldiers carrying clubs; these "
        "were all shaped like the three gardeners, oblong and flat, with their hands and feet at the corners: next the "
        "ten courtiers; these were ornamented all over with diamonds, and walked two and two, as the soldiers did. "
        "After these came the royal children; there were ten of them, and the little dears came jumping merrily along "
        "hand in hand, in couples: they were all ornamented with hearts. Next came the guests, mostly Kings and "
        "Queens, and among them Alice recognised the White Rabbit: it was talking in a hurried nervous manner, smiling "
        "at everything that was said, and went by without noticing her. Then followed the Knave of Hearts, carrying "
        "the King's crown on a crimson velvet cushion; and, last of all this grand procession, came THE KING AND QUEEN "
        "OF HEARTS.Alice was rather doubtful whether she ought not to lie down on her face like the three gardeners, "
        "but she could not remember ever having heard of such a rule at processions; 'and besides, what would be the "
        "use of a procession,' thought she, 'if people had all to lie down upon their faces, so that they couldn't see "
        "it?' So she stood still where she was, and waited.When the procession came opposite to Alice, they all "
        "stopped and looked at her, and the Queen said severely 'Who is this?' She said it to the Knave of Hearts, who "
        "only bowed and smiled in reply.'Idiot!' said the Queen, tossing her head impatiently; and, turning to Alice, "
        "she went on, 'What's your name, child?''My name is Alice, so please your Majesty,' said Alice very politely; "
        "but she added, to herself, 'Why, they're only a pack of cards, after all. I needn't be afraid of them!''And "
        "who are these?' said the Queen, pointing to the three gardeners who were lying round the rosetree; for, you "
        "see, as they were lying on their faces, and the pattern on their backs was the same as the rest of the pack, "
        "she could not tell whether they were gardeners, or soldiers, or courtiers, or three of her own children.'How "
        "should I know?' said Alice, surprised at her own courage. 'It's no business of mine.'The Queen turned crimson "
        "with fury, and, after glaring at her for a moment like a wild beast, screamed 'Off with her head! "
        "Off—''Nonsense!' said Alice, very loudly and decidedly, and the Queen was silent.The King laid his hand upon "
        "her arm, and timidly said 'Consider, my dear: she is only a child!'The Queen turned angrily away from him, "
        "and said to the Knave 'Turn them over!'The Knave did so, very carefully, with one foot.'Get up!' said the "
        "Queen, in a shrill, loud voice, and the three gardeners instantly jumped up, and began bowing to the King, "
        "the Queen, the royal children, and everybody else.'Leave off that!' screamed the Queen. 'You make me giddy.' "
        "And then, turning to the rose-tree, she went on, 'What have you been doing here?'";
    std::string const inputWithtokens =
        "Five and Seven said nothing, but looked at $(Two). $(Two) began in a low voice, 'Why the fact is, you see, "
        "Miss, "
        "this here ought to have been a red rose-tree, and we put a white one in by mistake; and if the Queen was to "
        "find it out, we should all have our $(heads) cut off, you know. So you see, Miss, we're doing our best, afore "
        "she comes, to—' At this moment Five, who had been anxiously looking across the garden, called out 'The Queen! "
        "The Queen!' and the three gardeners instantly threw themselves flat upon their faces. There was a sound of "
        "many footsteps, and Alice looked round, eager to see the $(Queen).First came ten soldiers carrying clubs; "
        "these "
        "were all shaped like the three gardeners, oblong and flat, with their hands and feet at the corners: next the "
        "ten courtiers; these were ornamented all over with $(diamonds), and walked two and two, as the soldiers did. "
        "After these came the royal children; there were ten of them, and the little dears came jumping merrily along "
        "hand in hand, in couples: they were all ornamented with hearts. Next came the guests, mostly Kings and "
        "Queens, and among them Alice recognised the White Rabbit: it was talking in a hurried nervous manner, smiling "
        "at everything that was said, and went by without noticing her. Then followed the Knave of Hearts, carrying "
        "the King's crown on a crimson velvet cushion; and, last of all this grand procession, came THE KING AND QUEEN "
        "OF HEARTS.Alice was rather doubtful whether she ought not to lie down on her face like the three gardeners, "
        "but she could not remember ever having heard of such a rule at processions; 'and besides, what would be the "
        "use of a procession,' thought she, 'if people had all to lie down upon their faces, so that they couldn't see "
        "it?' So she stood still where she was, and waited.When the procession came opposite to Alice, they all "
        "stopped and looked at her, and the $(Queen) said severely 'Who is this?' She said it to the Knave of Hearts, "
        "who "
        "only bowed and smiled in reply.'Idiot!' said the Queen, tossing her head impatiently; and, turning to Alice, "
        "she went on, 'What's your name, child?''My name is Alice, so please your Majesty,' said Alice very politely; "
        "but she added, to herself, 'Why, they're only a pack of cards, after all. I needn't be afraid of them!''And "
        "who are these?' said the $(Queen), pointing to the three gardeners who were lying round the rosetree; for, "
        "you "
        "see, as they were lying on their faces, and the $(pattern) on their backs was the same as the rest of the "
        "pack, "
        "she could not tell whether they were gardeners, or soldiers, or courtiers, or three of her own children.'How "
        "should I know?' said Alice, surprised at her own courage. 'It's no business of mine.'The Queen turned crimson "
        "with fury, and, after glaring at her for a moment like a wild beast, screamed 'Off with her head! "
        "Off—''Nonsense!' said $(Alice), very loudly and decidedly, and the Queen was silent.The $(King) laid his hand "
        "upon "
        "her arm, and timidly said 'Consider, my dear: she is only a child!'The $(Queen) turned angrily away from him, "
        "and said to the $(Knave) 'Turn them over!'The $(Knave) did so, very carefully, with one foot.'Get up!' said "
        "the "
        "Queen, in a shrill, loud voice, and the three gardeners instantly jumped up, and began bowing to the King, "
        "the Queen, the royal children, and everybody else.'Leave off that!' screamed the Queen. 'You make me giddy.' "
        "And then, turning to the rose-tree, she went on, 'What have you been doing here?'";

    static TokenMap const raw_tokens {
        {"Two", "Two"},           {"heads", "heads"},
        {"diamonds", "diamonds"}, {"Queen", "Queen"},
        {"pattern", "pattern"},   {"Alice", "Alice"},
        {"King", "King"},         {"Knave", "Knave"},
        {"Why", "Why"},           {"glaring", "glaring"},
        {"name", "name"},         {"know", "know"},
        {"Idiot", "Idiot"},       {"children", "children"},
        {"Nonsense", "Nonsense"}, {"procession", "procession"},
    };

    static TokenMap const tokens {
        {"$(Two)", "Two"},           {"$(heads)", "heads"},
        {"$(diamonds)", "diamonds"}, {"$(Queen)", "Queen"},
        {"$(pattern)", "pattern"},   {"$(Alice)", "Alice"},
        {"$(King)", "King"},         {"$(Knave)", "Knave"},
        {"$(Why)", "Why"},           {"$(glaring)", "glaring"},
        {"$(name)", "name"},         {"$(know)", "know"},
        {"$(Idiot)", "Idiot"},       {"$(children)", "children"},
        {"$(Nonsense)", "Nonsense"}, {"$(procession)", "procession"},
    };

}

NONIUS_BENCHMARK("manual_expand", [](nonius::chronometer cm)     {
    std::string const tmp = testdata::inputWithtokens;
    auto& tokens = testdata::raw_tokens;

    std::string result;
    cm.measure([&](int) {
        result = manual_expand(tmp, tokens);
    });
    assert(result == testdata::expected);
})

#ifdef USE_X3
NONIUS_BENCHMARK("spirit_x3", [](nonius::chronometer cm) {
    auto const symbols = [&] {
        x3::symbols<char const*> symbols;
        for(const auto& token : testdata::tokens) {
            symbols.add(token.first.c_str(), token.second.c_str());
        }
        return symbols;
    }();

    std::string result;
    cm.measure([&](int) {
            result = spirit_x3(testdata::inputWithtokens, symbols);
        });
    //std::cout << "====\n" << result << "\n====\n";
    assert(testdata::expected == result);
})
#else
NONIUS_BENCHMARK("spirit_qi", [](nonius::chronometer cm) {
    auto const symbols = [&] {
        bsq::symbols<char, std::string> symbols;
        for(const auto& token : testdata::tokens) {
            symbols.add(token.first.c_str(), token.second.c_str());
        }
        return symbols;
    }();

    std::string result;
    cm.measure([&](int) {
            result = spirit_qi(testdata::inputWithtokens, symbols);
        });
    assert(testdata::expected == result);
})

NONIUS_BENCHMARK("spirit_qi_old", [](nonius::chronometer cm) {
    std::string result;
    cm.measure([&](int) {
            result = spirit_qi_old(testdata::inputWithtokens, testdata::tokens);
        });
    assert(testdata::expected == result);
})
#endif

NONIUS_BENCHMARK("boyer_more", [](nonius::chronometer cm) {
    cm.measure([&](int) {
        std::string tmp = testdata::inputWithtokens;
        boyer_more(tmp, testdata::tokens);
        assert(tmp == testdata::expected);
    });
})

NONIUS_BENCHMARK("bmh_search", [](nonius::chronometer cm) {
    cm.measure([&](int) {
        std::string tmp = testdata::inputWithtokens;
        bmh_search(tmp, testdata::tokens);
        assert(tmp == testdata::expected);
    });
})

NONIUS_BENCHMARK("kmp_search", [](nonius::chronometer cm) {
    cm.measure([&](int) {
        std::string tmp = testdata::inputWithtokens;
        kmp_search(tmp, testdata::tokens);
        assert(tmp == testdata::expected);
    });
})

NONIUS_BENCHMARK("naive_stl", [](nonius::chronometer cm) {
    cm.measure([&](int) {
            std::string tmp = testdata::inputWithtokens;
            naive_stl(tmp, testdata::tokens);
            assert(tmp == testdata::expected);
        });
})

NONIUS_BENCHMARK("boost_replace_all", [](nonius::chronometer cm)     {
    std::string const tmp = testdata::inputWithtokens;

    std::string result;
    cm.measure([&](int) {
        result = boost_replace_all(testdata::inputWithtokens, testdata::tokens);
    });
    assert(result == testdata::expected);
})

So the question, what takes so long in such a simple task? One can say, ok, simple task, go ahead and implement it better. But the reality is that 15 years old MFC naive implementation does the task order of magnitude faster

其实答案很简单

首先,我使用 apple clang 7.0 在我的 macbook pro 上编译了您的代码:

$ cc --version
Apple LLVM version 7.0.0 (clang-700.1.76)
Target: x86_64-apple-darwin15.2.0
Thread model: posix

结果似乎与 OP 的...

Boost.Spirit symbols result: 3425?=3425
10000 cycles took 8906ms.
Boyer-Moore results:3425?=3425
10000 cycles took 2891ms.
Boyer Moore Hospool result:3425?=3425
10000 cycles took 2392ms.
Knuth Morris Pratt result: 3425?=3425
10000 cycles took 4363ms.
Naive STL search and replace result: 3425?=3425
10000 cycles took 4333ms.
Boost replace_all result:3425?=3425
10000 cycles took 23284ms.
MFCMimicking result:3425?=3425
10000 cycles took 426ms.    <-- seemingly outstanding, no?

然后我添加了 -O3 标志:

Boost.Spirit symbols result: 3425?=3425
10000 cycles took 675ms.
Boyer-Moore results:3425?=3425
10000 cycles took 788ms.
Boyer Moore Hospool result:3425?=3425
10000 cycles took 623ms.
Knuth Morris Pratt result: 3425?=3425
10000 cycles took 1623ms.

Naive STL search and replace result: 3425?=3425
10000 cycles took 562ms.                    <-- pretty good!!!

Boost replace_all result:3425?=3425
10000 cycles took 748ms.
MFCMimicking result:3425?=3425
10000 cycles took 431ms.                    <-- awesome but not as outstanding as it was!

现在结果与 MFC CString 结果处于同一数量级。

Why?

因为当您针对 BOOST and/or STL 进行编译时,您正在扩展模板并且库代码采用与您的编译单元相同的优化设置。

当您 link 反对 MFC 时,您 link 反对的是在启用优化的情况下编译的共享库。

当您使用 strstr 时,您调用的是预编译和优化的 c 库,并且在某些部分是手写的。当然会很快!

已解决:)

10000 cycles not 100000, different machine...

作为参考,这里是 100,000 次循环版本 运行 在笔记本电脑上使用电池供电的结果。全面优化(-O3):

Boost.Spirit symbols result: 3425?=3425
100000 cycles took 6712ms.
Boyer-Moore results:3425?=3425
100000 cycles took 7923ms.
Boyer Moore Hospool result:3425?=3425
100000 cycles took 6091ms.
Knuth Morris Pratt result: 3425?=3425
100000 cycles took 16330ms.

Naive STL search and replace result: 3425?=3425
100000 cycles took 6719ms.

Boost replace_all result:3425?=3425
100000 cycles took 7353ms.

MFCMimicking result:3425?=3425
100000 cycles took 4076ms.

好吧,说来话长。只是提醒您提出的问题。

  1. 为什么使用 C++(各种方法)搜索和替换这么慢?
  2. 为什么 MFC 搜索和替换这么快?

令人惊讶的是,这两个问题的答案是一样的。由于 C++ 开销。 是的。我们闪亮的现代 C++ 有一个开销,我们大多忽略了 灵活性和优雅。

但是,当涉及到亚微秒分辨率时(并不是说 C++ 不是 能够以纳秒分辨率做事)开销变得更多 突出。

让我用我在问题中发布的相同代码来展示,但它更多 与每个职能部门所做的事情保持一致。

Live On Coliru

它使用前面提到的 Nonius(感谢@sehe),交互结果是 here

You can click the legend to show/hide particular series.

结论

有两项成绩优异

  • MFC模仿函数和
  • 我自己手动替换

这些函数至少比其他函数快一个数量级,那有什么区别呢?

所有这些 "slow" 函数都是用 C++ 编写的,而 fast 是用 C 编写的(不是纯 C,我懒得处理 malloc/realloc 输出缓冲区,当输出大小增加时).好吧,我想很明显,有时别无选择,只能求助于纯 C。出于安全原因和缺乏类型安全性,我个人反对使用 C。此外,编写高质量的 C 代码只需要更多的专业知识和注意力。

暂时不会标记为答案,等待对此结论的评论。

感谢所有积极参与讨论、提出想法并指出我例子中不一致的人。

2019更新:
只是为了保留代码:https://github.com/kreuzerkrieg/string_search_replace
Nonius 结果:https://kreuzerkrieg.github.io/string_search_replace/

运行 在 Ubuntu 18.04

上使用 gcc-9

只是一些更新。我 运行 原始的 STL 代码(search)与 MFC-inspired 一个,我通过优化得到了它(-O2) stl-base 给出了 228ms 而MFC-like 给出 285ms。如果不进行优化,我会得到类似 7284ms310ms 的结果。我用 i7-6700HQ CPU @ 2.60GHz 在 macbook2016Pro 上做的。 所以基本上使用 strstr 的代码无法优化,而 STL 代码被大量优化。

然后我 运行 naiveSTL 代码的最终版本使用 find 而不是搜索,它给了我 28ms。所以绝对是赢家。我添加了下面的代码,以防万一@kreuzerkrieg 的 link 有一天会无效。

inline void naiveSTL(std::string& input, const TokenMap& tokens)
{
    input.reserve(input.size() * 2);
    for(const auto& token : tokens)
    {
        auto next = input.find(token.first);
        while(next != std::string::npos)
        {
            input.replace(next, token.first.size(), token.second);
            next = input.find(token.first, next + 1);
        }
    }
}