boost::spirit::x3 phrase_parse 在推入向量之前进行算术运算

Question

我正在为我的大学学习做一个项目。我的目标是将大文件 (2.6 GB) 中的双精度数字读取到双精度向量中。

我正在使用带有 mmap 的 boost spirit x3 库。我在网上找到了一些代码：我正在使用的https://github.com/sehe/bench_float_parsing。

在将这些双精度值推入向量之前，我想对它们进行一些算术运算。所以我被困在这里了。在推送值之前，我如何进行一些算术运算以将值加倍？

    template <typename source_it>
    size_t x3_phrase_parse<data::float_vector, source_it>::parse(source_it f, source_it l, data::float_vector& data) const {
        using namespace x3;
        bool ok = phrase_parse(f, l, *double_ % eol, space, data);
        if (ok)
            std::cout << "parse success\n";
        else
            std::cerr << "parse failed: '" << std::string(f, l) << "'\n";

        if (f != l) std::cerr << "trailing unparsed: '" << std::string(f, l) << "'\n";
        std::cout << "data.size(): " << data.size() << "\n";
        return data.size();
    }

Answer 1

很抱歉没有准确回答您的问题。但提升精神不是合适的工具。 Spirit 是一个解析器生成器（作为一个子集，当然也进行词法分析）。因此，在乔姆斯基的语言层次结构中更上一层楼。您不需要解析器，而是正则表达式：std:regex

可以使用正则表达式轻松找到双精度值。在附带的代码中，我为双打创建了一个简单的模式。并且可以使用正则表达式来搜索它。

因此，我们将从 istream 中读取（可以是文件、字符串流、控制台输入或其他任何内容）。我们将逐行读取，直到消耗完整个输入。

对于每一行，我们将检查输入是否与预期模式匹配，是否为 1 double。

然后我们读这个double，做一些计算然后把它压入vector。

请看下面非常简单的代码。

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <regex>

std::istringstream input{R"(0.0
1.5
2.0
3.0
4.0
-5.0
)"};

using VectorDouble = std::vector<double>;
const std::regex reDouble{R"(([-+]?[0-9]*\.?[0-9]*))"};

std::istream& get(std::istream& is, VectorDouble& dd)
{
    // Reset vector to empty before reading
    dd.clear();

    //Read all data from istream
    std::string line{};
    while (getline(is, line)) {
        // Search for 2 doubles
        std::smatch sm;
        if (std::regex_search(line, sm, reDouble)) {
            // Convert found strings to double
            double d1{std::stod(sm[1])};
            // Do some calculations
            d1 = d1 + 10.0;
            // Push back into vector
            dd.emplace_back(d1);
        }
        else
            std::cerr << "Error found in line: " << line << "\n";
    }
    return is;
}

int main()
{
    // Define vector and fill it
    VectorDouble dd{};
    (void)get(input, dd);

    // Some debug output
    for (double& d : dd) {
        std::cout << d << "\n";
    }
    return 0;
}

Answer 2

为什么不用semantic actions来进行算术运算？

Answer 3

在下面的代码中：

#include <iostream>
#include <sstream>
#include <string>
#include <cstdio>
#include <vector>

using VectorDouble = std::vector<double>;
void show( VectorDouble const& dd)
{
    std::cout<<"vector result=\n";
    for (double const& d : dd) {
        std::cout << d << "\n";
    }
}

auto arith_ops=[](double&x){ x+=10.0;};

std::string input_err_yes{R"(0.0
1.5
2.0xxx
not double
4.0
-5.0
)"};

std::string input_err_not{R"(0.0
1.5
2.0
3.0
4.0
-5.0
)"};

void stod_error_recov(std::string const&input)
//Use this for graceful error recovery in case input has syntax errors.
{
    std::cout<<__func__<<":\n";
    VectorDouble dd;

    std::istringstream is(input);
    std::string line{};
    while (getline(is, line) ) {
        try {
            std::size_t eod;
            double d1(std::stod(line,&eod));
            arith_ops(d1);
            dd.emplace_back(d1);
            auto const eol=line.size();
            if(eod!=eol) {
               std::cerr << "Warning: trailing chars after double in line: "<< line << "\n";
            }
        }
        catch (const std::invalid_argument&) {
            if(!is.eof())
              std::cerr << "Error: found in line: " << line << "\n";
        }
    }
    show(dd);
}

void stod_error_break(std::string const&input)
//Use this if input is sure to have correct syntax.
{
    std::cout<<__func__<<":\n";
    VectorDouble dd;

    char const*d=input.data();
    while(true) {
        try {
            std::size_t eod;
            double d1(std::stod(d,&eod));
            d+=eod;
            arith_ops(d1);
            dd.emplace_back(d1);
        }
        catch (const std::invalid_argument&) {
            //Either syntax error
            //Or end of input.
            break;
        }
    }
    show(dd);
}

#include <boost/spirit/home/x3.hpp>
void x3_error_break(std::string const&input)
//boost::spirit::x3 method.
{
    std::cout<<__func__<<":\n";
    VectorDouble dd;

    auto f=input.begin();
    auto l=input.end();
    using namespace boost::spirit::x3;
    auto arith_action=[](auto&ctx)
      { arith_ops(_attr(ctx));
      };
    phrase_parse(f, l, double_[arith_action] % eol, blank, dd);
    show(dd);
}

int main()
{
    //stod_error_recov(input_err_yes);
    //stod_error_break(input_err_not);
    x3_error_break(input_err_not);
    return 0;
}

stod_* 函数与 Armin 的不同，不需要 regex 因为 std:stod 进行解析，因为它不使用 regex 它可能运行得更快一些。

有 2 个带有源代码注释的 stod_* 函数指出应该使用哪个。

为了完整起见，使用 boost::spirit::x3 的第三个函数是显示。恕我直言，它的可读性比其他的要好；然而，编译可能需要更多时间。

boost::spirit::x3 phrase_parse 在推入向量之前进行算术运算

boost::spirit::x3 phrase_parse doing arithmetic operations before pushing into vector

c++

math

double

vector

boost-spirit-x3