在末尾使用值修饰符（'-'，'%'）解析字符串

Question

我试着掌握解析。

我有一些数据采用 de-de 格式，在字符串末尾有附加信息。

我设法让 de-de 部分正确，但我很难正确解析 - 和 %。我阅读了 codecvt，但我不明白这个主题。

这是我目前所了解的反映以及我需要做的事情的示例。

#include <string>
#include <locale>
#include <iostream>
#include <sstream>

using namespace std;

#define EXPECT_EQ(actual, expected) { \
    if (actual != expected) \
    { \
        cout << "expected " << #actual << " to be " << expected << " but was " << actual << endl; \
    } \
}

double parse(wstring numstr)
{
    double value;
    wstringstream is(numstr);
    is.imbue(locale("de-de"));
    is >> value;
    return value;
}

int main()
{
    EXPECT_EQ(parse(L"123"), 123); //ok
    EXPECT_EQ(parse(L"123,45"), 123.45); //ok
    EXPECT_EQ(parse(L"1.000,45"), 1000.45); //ok
    EXPECT_EQ(parse(L"2,390%"), 0.0239); //% sign at the end
    EXPECT_EQ(parse(L"1.234,56-"), -1234.56); //- sign at the end
}

输出为：

expected parse(L"2,390%") to be 0.0239 but was 2.39
expected parse(L"1.234,56-") to be -1234.56 but was 1234.56

我如何灌输我的流，以便它按照我的需要读取 - 和 % 标志？

Answer 1

codecvt 方面是错误的地方看这里。 codecvt 方面仅用于处理将字符的外部表示转换为同一字符的内部表示（例如，文件中的 UTF-8，内部 UTF-32/UCS-4）。

要像这样解析数字，您正在寻找 num_get 方面。基本思想是，您将创建一个从 std::num_get 派生的 class，它覆盖 do_get（至少）您关心的数字类型。

在典型情况下，您只对少数类型（例如 long long 和 long double）执行 "real" 实现，并将所有较小类型的函数委托给这些类型，然后转换结果到目标类型。

这是一个相当简单的 num_get 方面。目前，它仅尝试提供类型 double 的特殊处理。为了避免示例变得太长，我稍微简化了处理过程：

它不会尝试解析数字的指数（例如，1e99 中的“99”）。
它不会尝试处理 %- 的后缀（但会处理 -%）。
硬编码将“,”视为小数点，将“.”视为小数点作为千位分隔符。
它不会尝试检查千位分隔符的完整性。例如，1,,,3 将解析为 13.

在这些限制范围内，这里有一些代码：

#include <ios>
#include <string>
#include <locale>
#include <iostream>
#include <sstream>
#include <iterator>
#include <cctype>

using namespace std;

template <class charT, class InputIterator = istreambuf_iterator<charT> >
class read_num : public std::num_get < charT > {
public:
    typedef charT char_type;
    typedef InputIterator iter_type;
protected:
    iter_type do_get(iter_type in, iter_type end, ios_base& str, ios_base::iostate& err, double& val) const {
        double ret = 0.0;

        bool negative = false;
        using uc = std::make_unsigned<charT>::type;

        while (std::isspace((uc)*in))
            ++in;
        if (*in == '-') {
            negative = true;
            ++in;
            while (std::isspace((uc)*in))
                ++in;
        }
        while (std::isdigit((uc)*in)) {
            ret *= 10;
            ret += *in - '0';
            ++in;
            if (*in == '.')
                ++in;
        }
        if (*in == ',') {
            ++in;
            double place = 10.0;
            while (std::isdigit((uc)*in)) {
                ret += (*in - '0') / place;
                place *= 10;
                ++in;
            }
        }
        if (*in == '-') {
            negative = true;
            ++in;
        }
        if (*in == '%') {
            ret /= 100.0;
            ++in;
        }
        if (negative)
            ret = -ret;
        val = ret;
        return in;
    }
};

实际上，在这种情况下您可能不想以这种方式做事——您可能想委托现有的方面来正确读取数字，然后在它解析的内容结束时，寻找一个- and/or % 并做出适当的反应（例如，如果您发现前导和尾随 '-'，则可能会诊断出错误）。

Answer 2

我会正面解决这个问题：让我们在这里着手解析。

你最终会在某个地方写那个，所以我忘记了需要先创建一个（昂贵的）字符串流。

首选武器：提升精神

Note,

I parse the string using it's iterators directly. My code is pretty generic as to the type of floating point number used.

You can pretty much search replace double by e.g. boost::multiprecision::cpp_dec_float (or make it a template argument) and be parsing. Because I predict that you needed to parser decimal floating point numbers, not binary floating point numbers. You're losing accuracy in the conversion.

UPDATE: extended sample Live On Coliru

简单语法

它的核心是语法非常简单：

if (parse(numstr.begin(), numstr.end(), mynum >> matches['-'] >> matches['%'],
            value, sign, pct)) 
{
    if (sign) value = -value;
    if (pct)  value /= 100;

    return value;
}

给你。当然，我们需要定义 mynum 以便它按预期解析 unsigned 实数：

using namespace qi;
real_parser<double, de_numpolicy<double> > mynum;

魔法：`real_policies<>`

文档对解释如何 tweak real number parsing using real_policies 有很大帮助。这是我提出的政策：

template <typename T>
    struct de_numpolicy : qi::ureal_policies<T>
{
    //  No exponent
    template <typename It>                static bool parse_exp(It&, It const&)          { return false; } 
    template <typename It, typename Attr> static bool parse_exp_n(It&, It const&, Attr&) { return false; } 

    //  Thousands separated numbers
    template <typename It, typename Attr>
    static bool parse_n(It& first, It const& last, Attr& attr)
    {
        qi::uint_parser<unsigned, 10, 1, 3> uint3;
        qi::uint_parser<unsigned, 10, 3, 3> uint3_3;

        if (parse(first, last, uint3, attr)) {
            for (T n; qi::parse(first, last, '.' >> uint3_3, n);)
                attr = attr * 1000 + n;

            return true;
        }

        return false;
    }

    template <typename It>
        static bool parse_dot(It& first, It const& last) {
            if (first == last || *first != ',')
                return false;
            ++first;
            return true;
        }
};

完整演示

Live On Coliru

#include <boost/spirit/include/qi.hpp>
#include <iostream>


#define EXPECT_EQ(actual, expected) { \
    double v = (actual); \
    if (v != expected) \
    { \
        std::cout << "expected " << #actual << " to be " << expected << " but was " << v << std::endl; \
    } \
}

namespace mylib {
    namespace qi = boost::spirit::qi;

    template <typename T>
        struct de_numpolicy : qi::ureal_policies<T>
    {
        //  No exponent
        template <typename It>                static bool parse_exp(It&, It const&)          { return false; } 
        template <typename It, typename Attr> static bool parse_exp_n(It&, It const&, Attr&) { return false; } 

        //  Thousands separated numbers
        template <typename It, typename Attr>
        static bool parse_n(It& first, It const& last, Attr& attr)
        {
            qi::uint_parser<unsigned, 10, 1, 3> uint3;
            qi::uint_parser<unsigned, 10, 3, 3> uint3_3;

            if (parse(first, last, uint3, attr)) {
                for (T n; qi::parse(first, last, '.' >> uint3_3, n);)
                    attr = attr * 1000 + n;

                return true;
            }

            return false;
        }

        template <typename It>
            static bool parse_dot(It& first, It const& last) {
                if (first == last || *first != ',')
                    return false;
                ++first;
                return true;
            }
    };

    template<typename Char, typename CharT, typename Alloc>
    double parse(std::basic_string<Char, CharT, Alloc> const& numstr)
    {
        using namespace qi;
        real_parser<double, de_numpolicy<double> > mynum;

        double value;
        bool sign, pct;

        if (parse(numstr.begin(), numstr.end(), mynum >> matches['-'] >> matches['%'],
                    value, sign, pct)) 
        {
            // std::cout << "DEBUG: " << std::boolalpha << " '" << numstr << "' -> (" << value << ", " << sign << ", " << pct << ")\n";
            if (sign) value = -value;
            if (pct)  value /= 100;

            return value;
        }

        assert(false); // TODO handle errors
    }

} // namespace mylib

int main()
{
    EXPECT_EQ(mylib::parse(std::string("123")),       123);      // ok
    EXPECT_EQ(mylib::parse(std::string("123,45")),    123.45);   // ok
    EXPECT_EQ(mylib::parse(std::string("1.000,45")),  1000.45);  // ok
    EXPECT_EQ(mylib::parse(std::string("2,390%")),    0.0239);   // %  sign at the end
    EXPECT_EQ(mylib::parse(std::string("1.234,56-")), -1234.56); // -  sign at the end
}

如果取消注释 "DEBUG" 行，它会打印：

DEBUG:  '123' -> (123, false, false)
DEBUG:  '123,45' -> (123.45, false, false)
DEBUG:  '1.000,45' -> (1000.45, false, false)
DEBUG:  '2,390%' -> (2.39, false, true)
DEBUG:  '1.234,56-' -> (1234.56, true, false)

在末尾使用值修饰符（'-'，'%'）解析字符串

parsing strings with value modifiers ('-', '%') at the end

c++

parsing

iostream

codecvt

facets

UPDATE: extended sample Live On Coliru

简单语法

魔法：`real_policies<>`

完整演示

在末尾使用值修饰符（'-'，'%'）解析字符串

parsing strings with value modifiers ('-', '%') at the end

c++

parsing

iostream

codecvt

facets

UPDATE: extended sample Live On Coliru

简单语法

魔法：real_policies<>

完整演示

魔法：`real_policies<>`