原始运算符的数学错误

Question

我在使用内置运算符时遇到基本类型问题。我的所有运算符都适用于除 float 和 (un)signed long long int.

之外的所有数据类型

为什么乘以1还是不对？另外，为什么 +10 和 -10 给出的数字与 +1、-1、/1 和 *1.

相同

选择数字 461168601 是因为它符合最大值 float 和最大值 signed long long int。

运行以下代码并得到以下输出：

fmax  : 340282346638528859811704183484516925440
imax  : 9223372036854775807
i     : 461168601
f     : 10
f2    : 1

461168601 / 10 = 46116860
461168601 + 10 = 461168608
461168601 - 10 = 461168608

461168601 * 1 = 461168608
461168601 / 1 = 461168608
461168601 + 1 = 461168608
461168601 - 1 = 461168608

下面的代码可以运行 here.

#include <iostream>
#include <sstream>
#include <iomanip>
#include <limits>

#define fmax std::numeric_limits<float>::max()
#define imax std::numeric_limits<signed long long int>::max()

int main()
{

    signed long long int i    = 461168601;
    float f = 10;
    float f2 = 1;
    std::cout << std::setprecision(40);
    std::cout <<"fmax  : " << fmax  << std::endl;
    std::cout <<"imax  : " << imax  << std::endl;
    std::cout <<"i     : " << i    << std::endl;
    std::cout <<"f     : " << f    << std::endl;
    std::cout <<"f2    : " << f2   << std::endl;
    std::cout <<std::endl;
    std::cout << i << " / " << f << " = " << i / f << std::endl;
    std::cout << i << " + " << f << " = " << i + f << std::endl;
    std::cout << i << " - " << f << " = " << i - f << std::endl;
    std::cout <<std::endl;
    std::cout << i << " * " << f2 << " = " <<i * f2 << std::endl;
    std::cout << i << " / " << f2 << " = " << i / f2 << std::endl;
    std::cout << i << " + " << f2 << " = " << i + f2 << std::endl;
    std::cout << i << " - " << f2 << " = " << i - f2 << std::endl;
}

Answer 1

错误是由于4611686018427387904与1或10相差太大造成的。你永远不应该对具有这种差异的数字求和，因为两个最接近的浮点数之间的实际差异随着指数值而增长。

当两个浮点数相加时，首先将它们对齐到相同的指数值（较大的那个），所以在操作之前你有例如1e10 和 1e-10 对齐后你有 1e10 和 0e10 结果是 1e10.

Answer 2

挖了一些，发现 this article。

Casting opens up its own can of worms. You have to be careful, because your float might not have enough precision to preserve an entire integer. A 32-bit integer can represent any 9-digit decimal number, but a 32-bit float only offers about 7 digits of precision. So if you have large integers, making this conversion will clobber them. Thankfully, doubles have enough precision to preserve a whole 32-bit integer (notice, again, the analogy between floating point precision and integer dynamic range). Also, there is some overhead associated with converting between numeric types, going from float to int or between float and double.

因此，基本上一旦一个数字的整个部分达到大约七位数以上，float 就开始移动该数字以将数字的整个部分保持在七位数左右。当发生这种小数点移动时，数字开始达到浮点数不准确。

原始运算符的数学错误

Math Error with Primitive Operators

c++

casting

operators

unsigned-long-long-int