std::scientific 是否总是导致浮点数的规范化科学记数法?

Does std::scientific always result in normalized scientific notation for floating-point numbers?

科学计数法定义了数字应该如何使用符号、数字和指数来显示,但它没有声明可视化是标准化的。

示例:-2.34e-2(归一化科学计数法)与-0.234e-1(科学计数法)相同)

我能否依靠以下代码始终生成规范化结果? 编辑: 除了答案中指出的 NAN 和 INF。

template<typename T>
static std::string toScientificNotation(T number, unsigned significantDigits)
{
    if (significantDigits > 0) {
        significantDigits--;
    }
    std::stringstream ss;
    ss.precision(significantDigits);
    ss << std::scientific << number;
    return ss.str();
}

如果是,请在 C++ documentation/standard 中列出一个部分说明它不是 platform/implementation-defined。由于 0 的值也有不同的表示方式,我担心某些非常小的数字(非规范化?!)可能会以不同的方式可视化。在我的编译器平台上,它目前适用于 std::numeric_limits::min(), denorm_min().

注意:我用它来查找数字的数量级,而不会弄乱浮点数分析的所有古怪细节。我希望标准库为我做这件事:-)

Can I rely on the following code always producing the normalized outcome?

没有任何保证,不。更好地说:标准并没有像你希望的那样强加保证。

std::scientific仅在以下相关部分引用:

  1. [floatfield.manip]:2

    ios_base& scientific(ios_base& str);  
    

    Effects: Calls str.setf(ios_­base​::​scientific, ios_­base​::​floatfield).
    Returns: str.

  2. Table 101 — fmtflags effects

    | Element    | Effect(s) if set                                       |
    | ...        | ...                                                    |
    | scientific | generates floating-point output in scientific notation |
    | ...        | ...                                                    |
    

是,零、无穷大和 NaN 除外。

C++标准是指C标准进行格式化,需要规范化的科学计数法

  • [floatfield.manip]/2

    ios_base& scientific(ios_base& str);
    

    Effects: Calls str.setf(ios_­base​::​scientific, ios_­base​::​floatfield).

    Returns: str.

  • [ostream.inserters.arithmetic]/1(部分)

    operator<<(float val);
    operator<<(double val);
    operator<<(long double val);
    

    Effects: The classes num_­get<> and num_­put<> handle locale-dependent numeric formatting and parsing. These inserter functions use the imbued locale value to perform numeric formatting. When val is of type ..., double, long double, ..., the formatting conversion occurs as if it performed the following code fragment:

    bool failed = use_facet<
      num_put<charT, ostreambuf_iterator<charT, traits>>
        >(getloc()).put(*this, *this, fill(), val).failed();
    

    When val is of type float the formatting conversion occurs as if it performed the following code fragment:

    bool failed = use_facet<
      num_put<charT, ostreambuf_iterator<charT, traits>>
        >(getloc()).put(*this, *this, fill(),
          static_cast<double>(val)).failed();
    
  • [facet.num.put.virtuals]/1:5.1(部分)

    • Stage 1:

      The first action of stage 1 is to determine a conversion specifier. The tables that describe this determination use the following local variables

      fmtflags flags = str.flags();
      fmtflags floatfield = (flags & (ios_base::floatfield));
      

      For conversion from a floating-point type, the function determines the floating-point conversion specifier as indicated in Table 70.

      Table 70 — Floating-point conversions

      | State                                            | stdio equivalent |
      | ------------------------------------------------ | ---------------- |
      | floatfield == ios_­base​::​scientific && !uppercase | %e               |
      | floatfield == ios_­base​::​scientific               | %E               |
      

      The representations at the end of stage 1 consists of the char's that would be printed by a call of printf(s, val) where s is the conversion specifier determined above.

  • C11 n1570 [7.21.6.1]:8.4

    • e,E

      A double argument representing a floating-point number is converted in the style [−]d.ddde±dd, where there is one digit (which is nonzero if the argument is nonzero) before the decimal-point character and the number of digits after it is equal to the precision; if the precision is missing, it is taken as 6; if the precision is zero and the # flag is not specified, no decimal-point character appears. The value is rounded to the appropriate number of digits. The E conversion specifier produces a number with E instead of e introducing the exponent. The exponent always contains at least two digits, and only as many more digits as necessary to represent the exponent. If the value is zero, the exponent is zero.

      A double argument representing an infinity or NaN is converted in the style of an f or F conversion specifier.