我可以在 C++ 函数 getline 中使用 2 个或更多分隔符吗？

Question

我想知道如何在 getline 函数中使用 2 个或更多分隔符，这是我的问题：

程序读取一个文本文件...每一行都将像：

   New Your, Paris, 100
   CityA, CityB, 200

我正在使用getline(file, line)，但是我得到了整行，当我想得到CityA，然后是CityB，然后是数字；如果我使用'，'分隔符，我将不知道下一行是什么时候，所以我想找出一些解决方案..

但是，我怎么能使用逗号和 \n 作为分隔符呢？顺便说一下，我正在操作字符串类型，而不是 char，所以 strtok 是不可能的：/

一些划痕：

string line;
ifstream file("text.txt");
if(file.is_open())
   while(!file.eof()){
     getline(file, line);
        // here I need to get each string before comma and \n
   }

Answer 1

不，std::getline() 只接受单个字符，以覆盖默认分隔符。 std::getline() 没有多个备用分隔符的选项。

解析这种输入的正确方法是使用默认的std::getline()将整行读入一个std::string，然后构造一个std::istringstream，然后将其进一步解析为逗号分隔值。

但是，如果您真正要解析逗号分隔值，则应该使用 a proper CSV parser.

Answer 2

我认为你不应该这样解决问题（即使你能做到）；相反：

在每一行中使用你必须阅读的内容
然后用逗号分隔该行以获得您想要的部分。

如果 strtok 可以完成 #2 的工作，您可以随时将字符串转换为字符数组。

Answer 3

您可以使用 std::getline 读取一行，然后将该行传递给 std::stringstream 并从中读取逗号分隔值

string line;
ifstream file("text.txt");
if(file.is_open()){
   while(getline(file, line)){   // get a whole line
       std::stringstream ss(line);
        while(getline(ss, line, ',')){
             // You now have separate entites here
        }
   }

Answer 4

通常，以分层的、树状的方式解析字符输入更直观、更有效，您首先将字符串拆分为其主要块，然后继续处理每个块，将它们拆分分成更小的部分，依此类推。

另一种方法是像 strtok 那样进行标记化——从输入开始，一次处理一个标记，直到遇到输入结束。在解析简单输入时，这可能是首选，因为它易于实现。这种风格也可以在解析具有嵌套结构的输入时使用，但这需要维护某种上下文信息，这可能会变得太复杂而无法在单个函数或有限的代码区域内维护。

依赖 C++ std 库的人通常最终会使用 std::stringstream 和 std::getline 来标记字符串输入。但是，这只会给你一个分隔符。他们永远不会考虑使用 strtok，因为它是来自 C 运行时库的不可重入的垃圾。所以，他们最终使用流，并且只有一个分隔符，一个人有义务使用分层解析样式。

但 zneak 提出了 std::string::find_first_of，它采用一组字符和 returns 最接近包含该组字符的字符串开头的位置。还有其他成员函数：find_last_of、find_first_not_of 等等，它们的存在似乎只是为了解析字符串。但是 std::string 没有提供有用的分词功能。

另一个选择是 <regex> 库，它可以做任何你想做的事，但它是新的，你需要习惯它的语法。

但是，只需很少的努力，您就可以利用 std::string 中的现有函数来执行标记化任务，而无需求助于流。这是一个简单的例子。 get_to() 是分词函数，tokenize 演示了它的使用方法。

这个例子中的代码会比strtok慢，因为它不断地从正在解析的字符串的开头删除字符，并且还会复制和returns个子字符串。这使得代码易于理解，但这并不意味着更高效的标记化是不可能的。它甚至不会比这更复杂——您只需跟踪您的当前位置，将其用作 std::string 成员函数中的 start 参数，永远不要更改源字符串。毫无疑问，甚至还有更好的技术。

要了解示例代码，请从底部开始，main() 所在的位置以及您可以在其中查看函数的使用方式的位置。这段代码的顶部主要是基本的实用函数和愚蠢的注释。

#include <iostream>
#include <string>
#include <utility>

namespace string_parsing {
// in-place trim whitespace off ends of a std::string
inline void trim(std::string &str) {
    auto space_is_it = [] (char c) {
        // A few asks:
        // * Suppress criticism WRT localization concerns
        // * Avoid jumping to conclusions! And seeing monsters everywhere! 
        //   Things like...ah! Believing "thoughts" that assumptions were made
        //   regarding character encoding.
        // * If an obvious, portable alternative exists within the C++ Standard Library,
        //   you will see it in 2.0, so no new defect tickets, please.
        // * Go ahead and ignore the rumor that using lambdas just to get 
        //   local function definitions is "cheap" or "dumb" or "ignorant."
        //   That's the latest round of FUD from...*mumble*.
        return c > '[=10=]' && c <= ' '; 
    };

    for(auto rit = str.rbegin(); rit != str.rend(); ++rit) {
        if(!space_is_it(*rit)) {
            if(rit != str.rbegin()) {
                str.erase(&*rit - &*str.begin() + 1);
            }
            for(auto fit=str.begin(); fit != str.end(); ++fit) {
                if(!space_is_it(*fit)) {
                    if(fit != str.begin()) {
                        str.erase(str.begin(), fit);
                    }
                    return;
    }   }   }   }
    str.clear();
}

// get_to(string, <delimiter set> [, delimiter])
// The input+output argument "string" is searched for the first occurance of one 
// from a set of delimiters.  All characters to the left of, and the delimiter itself
// are deleted in-place, and the substring which was to the left of the delimiter is
// returned, with whitespace trimmed.
// <delimiter set> is forwarded to std::string::find_first_of, so its type may match
// whatever this function's overloads accept, but this is usually expressed
// as a string literal: ", \n" matches commas, spaces and linefeeds.
// The optional output argument "found_delimiter" receives the delimiter character just found.
template <typename D>
inline std::string get_to(std::string& str, D&& delimiters, char& found_delimiter) {
    const auto pos = str.find_first_of(std::forward<D>(delimiters));
    if(pos == std::string::npos) {
        // When none of the delimiters are present,
        // clear the string and return its last value.
        // This effectively makes the end of a string an
        // implied delimiter.
        // This behavior is convenient for parsers which
        // consume chunks of a string, looping until
        // the string is empty.
        // Without this feature, it would be possible to 
        // continue looping forever, when an iteration 
        // leaves the string unchanged, usually caused by
        // a syntax error in the source string.
        // So the implied end-of-string delimiter takes
        // away the caller's burden of anticipating and 
        // handling the range of possible errors.
        found_delimiter = '[=10=]';
        std::string result;
        std::swap(result, str);
        trim(result);
        return result;
    }
    found_delimiter = str[pos];
    auto left = str.substr(0, pos);
    trim(left);
    str.erase(0, pos + 1);
    return left;
}

template <typename D>
inline std::string get_to(std::string& str, D&& delimiters) {
    char discarded_delimiter;
    return get_to(str, std::forward<D>(delimiters), discarded_delimiter);
}

inline std::string pad_right(const std::string&     str,
                             std::string::size_type min_length,
                             char                   pad_char=' ')
{
    if(str.length() >= min_length ) return str;
    return str + std::string(min_length - str.length(), pad_char);
}

inline void tokenize(std::string source) {
    std::cout << source << "\n\n";
    bool quote_opened = false;
    while(!source.empty()) {
        // If we just encountered an open-quote, only include the quote character
        // in the delimiter set, so that a quoted token may contain any of the
        // other delimiters.
        const char* delimiter_set = quote_opened ? "'" : ",'{}";
        char delimiter;
        auto token = get_to(source, delimiter_set, delimiter);
        quote_opened = delimiter == '\'' && !quote_opened;
        std::cout << "    " << pad_right('[' + token + ']', 16) 
            << "   " << delimiter << '\n';
    }
    std::cout << '\n';
}
}

int main() {
    string_parsing::tokenize("{1.5, null, 88, 'hi, {there}!'}");
}

这输出：

{1.5, null, 88, 'hi, {there}!'}

    []                 {
    [1.5]              ,
    [null]             ,
    [88]               ,
    []                 '
    [hi, {there}!]     '
    []                 }

我可以在 C++ 函数 getline 中使用 2 个或更多分隔符吗？

Can I use 2 or more delimiters in C++ function getline?

c++

getline

delimiter