提升分词器/字符分隔符

boost tokenizer / char separator

我已经尝试过代码的注释版本和未注释版本:

string separator1(""); //dont let quoted arguments escape themselves
string separator2(",\n"); //split on comma and newline
string separator3("\"\'"); //let it have quoted arguments

escaped_list_separator<char> els(separator1, separator2, separator4);
tokenizer<escaped_list_separator<char>> tok(str);//, els);


for (tokenizer<escaped_list_separator<char>>::iterator beg = tok.begin();beg!= tok.end(); ++beg) {
next = *beg;
boost::trim(next);
cout << counter << " " << next << endl;
counter++;
}

分隔具有以下格式的文件:

 12345, Test Test, Test
 98765, Test2 test2, Test2

这是输出

0 12345
1 Test Test
2 Test
98765
3 Test2 test2
4 Test2

我不确定问题出在哪里,但我需要实现的是在 98765 之前有一个数字 3

您忘记了换行符:string separator2(",\n");

#include <iostream>
#include <boost/tokenizer.hpp>
#include <boost/algorithm/string.hpp>

using namespace std;

   using namespace boost;

int main() {
    string str = "TEst,hola\nhola";
    string separator1(""); //dont let quoted arguments escape themselves
    string separator2(",\n"); //split on comma and newline
    string separator3("\""); //let it have quoted arguments

    escaped_list_separator<char> els(separator1, separator2, separator3);
    tokenizer<escaped_list_separator<char>> tok(str, els);

    int counter = 0, current_siding = 0, wagon_pos = 0, cur_vector_pos = 0;

    string next;

    for (tokenizer<escaped_list_separator<char>>::iterator beg = tok.begin();     beg != tok.end(); ++beg) {
        next = *beg;
        boost::trim(next);
        cout << counter << " " << next << endl;
        counter++;

    }
    return 0;
}  

在我看来你是在解析,而不是拆分。

在我看来,使用解析器生成器会更好

Live On Coliru

#include <boost/spirit/include/qi.hpp>
namespace qi = boost::spirit::qi;

int main() {
    boost::spirit::istream_iterator f(std::cin >> std::noskipws), l;

    std::vector<std::string> columns;
    qi::parse(f, l, +~qi::char_(",\r\n") % (qi::eol | ','), columns);

    size_t n = 0;
    for(auto& tok : columns) { std::cout << n++ << "\t" << tok << "\n"; }
}

版画

0   12345
1    Test Test
2    Test
3   98765
4    Test2 test2
5    Test2

坦率地说,我认为它更好,因为它可以让你写

phrase_parse(f, l, (qi::_int >> *(',' >> +~qi::char_("\r\n,")) % qi::eol, qi::blank...);

并为 "free"

正确解析数据类型、空格跳过等