如何在没有任何动态分配的情况下将 string_view 拆分为多个 string_view 对象

Question

下面的代码片段来自 this 回答。

#include <string>
#include <vector>

void tokenize(std::string str, std::vector<string> &token_v){
    size_t start = str.find_first_not_of(DELIMITER), end=start;

    while (start != std::string::npos){
        // Find next occurence of delimiter
        end = str.find(DELIMITER, start);
        // Push back the token found into vector
        token_v.push_back(str.substr(start, end-start));
        // Skip all occurences of the delimiter to find new start
        start = str.find_first_not_of(DELIMITER, end);
    }
}

现在像这样的缓冲区：

std::array<char, 150> buffer;

我想要一个 sting_view（指向缓冲区）并将其传递给分词器函数，分词应该 return 以 std::string_views 的形式编辑通过 out 参数（而不是向量），它还将 return 提取的标记数。界面如下所示：

size_t tokenize( const std::string_view inputStr,
                 const std::span< std::string_view > foundTokens_OUT,
                 const size_t expectedTokenCount )
{
    // implementation
}

int main( )
{
    std::array<char, 150> buffer { " @a hgs -- " };
    const std::string_view sv { buffer.data( ), buffer.size( ) };
    const size_t expectedTokenCount { 4 };

    std::array< std::string_view, expectedTokenCount > foundTokens; // the span for storing found tokens

    const size_t num_of_found_tokens { tokenize( sv, foundTokens, expectedTokenCount ) };

    if ( num_of_found_tokens == expectedTokenCount )
    {
        // do something
        std::clog << "success\n" << num_of_found_tokens << '\n';
    }

    for ( size_t idx { }; idx < num_of_found_tokens; ++idx )
    {
        std::cout << std::quoted( foundTokens[ idx ] ) << '\n';
    }
}

如果有人可以实现类似的标记化功能，但对于基于 space 和制表符拆分的 string_view，我将不胜感激。我试着自己写一个，但没有按预期工作（不支持选项卡）。另外，如果在 inputStr 中找到的标记数量超过 expectedTokenCount，我希望此函数停止工作和 return expectedTokenCount + 1。这样显然效率更高

这是我的虚拟版本：

size_t tokenize( const std::string_view inputStr,
                 const std::span< std::string_view > foundTokens_OUT,
                 const size_t expectedTokenCount )
{
    if ( inputStr.empty( ) )
    {
        return 0;
    }

    size_t start { inputStr.find_first_not_of( ' ' ) };
    size_t end { start };

    size_t foundTokensCount { };

    while ( start != std::string_view::npos && foundTokensCount < expectedTokenCount )
    {
        end = inputStr.find( ' ', start );
        foundTokens_OUT[ foundTokensCount++ ] = inputStr.substr( start, end - start );
        start = inputStr.find_first_not_of( ' ', end );
    }

    return foundTokensCount;
}

注意：范围库还没有适当的支持（至少在 GCC 上）所以我试图避免这种情况。

Answer 1

I tried to write one myself but it didn't work as expected (didn't support the tab).

如果你想支持空格和制表符的分割，那么你可以使用 find_first_not_of 的另一个重载：

size_type find_first_not_of(const CharT* s, size_type pos = 0) const;

这将在 s.

指向的字符串中找到等于 none 个字符的第一个字符

所以你的实现只需要将find_first_not_of(' ')和find(' ')更改为find_first_not_of(" \t")和find_first_of(" \t")。

Demo

如何在没有任何动态分配的情况下将 string_view 拆分为多个 string_view 对象

how to split a string_view into multiple string_view objects without any dynamic allocations

c++

optimization

stringtokenizer

string-view

c++20