如果没有未定义的行为 [c++],哪些浮点值不能转换为 int?
What float values could not be converted to int without undefined behavior [c++]?
我刚从 C++14 标准中读到这个(我的重点):
4.9 Floating-integral conversions [conv.fpint]
1 A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be
represented in the destination type. [...]
这让我开始思考
- 哪个
float
值在截断后不能表示为 int
? (这取决于实施吗?)
- 如果有,这是否意味着
auto x = static_cast<int>(float)
不安全?
- 将
float
转换为 int
的 proper/safe 方法是什么(假设您想要截断)?
我们不久前遇到了这个问题,我手动制作了一些表格,这些表格在各种转换为各种大小的整数的边缘具有精确的浮点位模式。请注意,这假设 iee754 4 字节 floats
和 8 字节 doubles
以及 2 的补码有符号整数(int32_t
4 字节和 int64_t
8 字节)。
如果您需要将位模式转换为浮点数或双精度数,您需要输入双关语(技术上是 UB)或 memcpy
它们。
为了回答你的问题,任何太大而无法放入目标整数的问题在转换时都是 UB,唯一需要截断为零的时间是 double
-> int32_t
。因此,使用以下值,您可以将浮点数与相关的 min/max 进行比较,并且仅在它们在范围内时才进行转换。
请注意,使用 INT_MIN
/INT_MAX
(或其现代极限对应物)进行交叉转换然后进行比较并不总是有效,因为这些大小值的浮点数精度非常低。
Inf/NaN 也是转换 UB。
// float->int64 edgecases
static const uint32_t FloatbitsMaxFitInt64 = 0x5effffff; // [9223371487098961920] Largest float which still fits int an signed int64
static const uint32_t FloatbitsMinNofitInt64 = 0x5f000000; // [9223372036854775808] the bit pattern of the smallest float which is too big for a signed int64
static const uint32_t FloatbitsMinFitInt64 = 0xdf000000; // [-9223372036854775808] Smallest float which still fits int an signed int64
static const uint32_t FloatbitsMaxNotfitInt64 = 0xdf000001; // [-9223373136366403584] Largest float which to small for a signed int64
// float->int32 edgecases
static const uint32_t FloatbitsMaxFitInt32 = 0x4effffff; // [2147483520] the bit pattern of the largest float which still fits int an signed int32
static const uint32_t FloatbitsMinNofitInt32 = 0x4f000000; // [2147483648] the bit pattern of the smallest float which is too big for a signed int32
static const uint32_t FloatbitsMinFitInt32 = 0xcf000000; // [-2147483648] the bit pattern of the smallest float which still fits int an signed int32
static const uint32_t FloatbitsMaxNotfitInt32 = 0xcf000001; // [-2147483904] the bit pattern of the largest float which to small for a signed int32
// double->int64 edgecases
static const uint64_t DoubleBitsMaxFitInt64 = 0x43dfffffffffffff; // [9223372036854774784] Largest double which fits into an int64
static const uint64_t DoubleBitsMinNofitInt64 = 0x43e0000000000000; // [9223372036854775808] Smallest double which is too big for an int64
static const uint64_t DoubleBitsMinFitInt64 = 0xc3e0000000000000; // [-9223372036854775808] Smallest double which fits into an int64
static const uint64_t DoubleBitsMaxNotfitInt64 = 0xc3e0000000000001; // [-9223372036854777856] largest double which is too small to fit into an int64
// double->int32 edgecases[when truncating(round towards zero)]
static const uint64_t DoubleBitsMaxTruncFitInt32 = 0x41dfffffffffffff; // [~2147483647.9999998] Largest double that when truncated will fit into an int32
static const uint64_t DoubleBitsMinTruncNofitInt32 = 0x41e0000000000000; // [2147483648.0000000] Smallest double that when truncated wont fit into an int32
static const uint64_t DoubleBitsMinTruncFitInt32 = 0xc1e00000001fffff; // [~2147483648.9999995] Smallest double that when truncated will fit into an int32
static const uint64_t DoubleBitsMaxTruncNofitInt32 = 0xc1e0000000200000; // [2147483649.0000000] Largest double that when truncated wont fit into an int32
// double->int32 edgecases [when rounding via bankers method(round to nearest, round to even on half)]
static const uint64_t DoubleBitsMaxRoundFitInt32 = 0x41dfffffffdfffff; // [2147483647.5000000] Largest double that when rounded will fit into an int32
static const uint64_t DoubleBitsMinRoundNofitInt32 = 0x41dfffffffe00000; // [~2147483647.5000002] Smallest double that when rounded wont fit into an int32
static const uint64_t DoubleBitsMinRoundFitInt32 = 0xc1e0000000100000; // [-2147483648.5000000] Smallest double that when rounded will fit into an int32
static const uint64_t DoubleBitsMaxRoundNofitInt32 = 0xc1e0000000100001; // [~2147483648.5000005] Largest double that when rounded wont fit into an int32
因此,对于您想要的示例:
if( f >= B2F(FloatbitsMinFitInt32) && f <= B2F(FloatbitsMaxFitInt32))
// cast is valid.
B2F 是这样的:
float B2F(uint32_t bits)
{
static_assert(sizeof(float) == sizeof(uint32_t), "Weird arch");
float f;
memcpy(&f, &bits, sizeof(float));
return f;
}
请注意,此转换会正确选择 nans/inf(因为与它们的比较是错误的)除非您使用的是编译器的非 iee754 模式(例如gcc 上的 ffast-math 或 msvc 上的 /fp:fast)
float
的值超出了 int
范围,这不足为奇。发明浮点值是为了充分表示非常大(也非常小)的值。
INT_MAX + 1
(通常等于2147483648
)不能用int
表示,但可以用float
. 表示
- 是的,
static_cast<int>(float)
和未定义的行为一样不安全。但是,对于足够大的整数 x
和 y
,像 x + y
这样简单的东西也是 UB,所以这里也没什么大惊喜。
做事情的正确方法取决于应用程序,就像在 C++ 中一样。 Boost 有 numeric_cast
会在溢出时抛出异常;这可能对你有好处。要进行饱和(将太大的值转换为 INT_MIN
和 INT_MAX
),请编写如下代码
float f;
int i;
...
if (static_cast<double>(INT_MIN) <= f && f < static_cast<double>(INT_MAX))
i = static_cast<int>(f);
else if (f < 0)
i = INT_MIN;
else
i = INT_MAX;
然而,这并不理想。你的系统有没有可以表示int
最大值的double
类型?如果是,它将起作用。此外,您希望如何精确舍入接近 int
的最小值或最大值的值?如果您不想考虑此类问题,请使用 boost::numeric_cast
,如 here.
所述
我刚从 C++14 标准中读到这个(我的重点):
4.9 Floating-integral conversions [conv.fpint]
1 A prvalue of a floating point type can be converted to a prvalue of an integer type. The conversion truncates; that is, the fractional part is discarded. The behavior is undefined if the truncated value cannot be represented in the destination type. [...]
这让我开始思考
- 哪个
float
值在截断后不能表示为int
? (这取决于实施吗?) - 如果有,这是否意味着
auto x = static_cast<int>(float)
不安全? - 将
float
转换为int
的 proper/safe 方法是什么(假设您想要截断)?
我们不久前遇到了这个问题,我手动制作了一些表格,这些表格在各种转换为各种大小的整数的边缘具有精确的浮点位模式。请注意,这假设 iee754 4 字节 floats
和 8 字节 doubles
以及 2 的补码有符号整数(int32_t
4 字节和 int64_t
8 字节)。
如果您需要将位模式转换为浮点数或双精度数,您需要输入双关语(技术上是 UB)或 memcpy
它们。
为了回答你的问题,任何太大而无法放入目标整数的问题在转换时都是 UB,唯一需要截断为零的时间是 double
-> int32_t
。因此,使用以下值,您可以将浮点数与相关的 min/max 进行比较,并且仅在它们在范围内时才进行转换。
请注意,使用 INT_MIN
/INT_MAX
(或其现代极限对应物)进行交叉转换然后进行比较并不总是有效,因为这些大小值的浮点数精度非常低。
Inf/NaN 也是转换 UB。
// float->int64 edgecases
static const uint32_t FloatbitsMaxFitInt64 = 0x5effffff; // [9223371487098961920] Largest float which still fits int an signed int64
static const uint32_t FloatbitsMinNofitInt64 = 0x5f000000; // [9223372036854775808] the bit pattern of the smallest float which is too big for a signed int64
static const uint32_t FloatbitsMinFitInt64 = 0xdf000000; // [-9223372036854775808] Smallest float which still fits int an signed int64
static const uint32_t FloatbitsMaxNotfitInt64 = 0xdf000001; // [-9223373136366403584] Largest float which to small for a signed int64
// float->int32 edgecases
static const uint32_t FloatbitsMaxFitInt32 = 0x4effffff; // [2147483520] the bit pattern of the largest float which still fits int an signed int32
static const uint32_t FloatbitsMinNofitInt32 = 0x4f000000; // [2147483648] the bit pattern of the smallest float which is too big for a signed int32
static const uint32_t FloatbitsMinFitInt32 = 0xcf000000; // [-2147483648] the bit pattern of the smallest float which still fits int an signed int32
static const uint32_t FloatbitsMaxNotfitInt32 = 0xcf000001; // [-2147483904] the bit pattern of the largest float which to small for a signed int32
// double->int64 edgecases
static const uint64_t DoubleBitsMaxFitInt64 = 0x43dfffffffffffff; // [9223372036854774784] Largest double which fits into an int64
static const uint64_t DoubleBitsMinNofitInt64 = 0x43e0000000000000; // [9223372036854775808] Smallest double which is too big for an int64
static const uint64_t DoubleBitsMinFitInt64 = 0xc3e0000000000000; // [-9223372036854775808] Smallest double which fits into an int64
static const uint64_t DoubleBitsMaxNotfitInt64 = 0xc3e0000000000001; // [-9223372036854777856] largest double which is too small to fit into an int64
// double->int32 edgecases[when truncating(round towards zero)]
static const uint64_t DoubleBitsMaxTruncFitInt32 = 0x41dfffffffffffff; // [~2147483647.9999998] Largest double that when truncated will fit into an int32
static const uint64_t DoubleBitsMinTruncNofitInt32 = 0x41e0000000000000; // [2147483648.0000000] Smallest double that when truncated wont fit into an int32
static const uint64_t DoubleBitsMinTruncFitInt32 = 0xc1e00000001fffff; // [~2147483648.9999995] Smallest double that when truncated will fit into an int32
static const uint64_t DoubleBitsMaxTruncNofitInt32 = 0xc1e0000000200000; // [2147483649.0000000] Largest double that when truncated wont fit into an int32
// double->int32 edgecases [when rounding via bankers method(round to nearest, round to even on half)]
static const uint64_t DoubleBitsMaxRoundFitInt32 = 0x41dfffffffdfffff; // [2147483647.5000000] Largest double that when rounded will fit into an int32
static const uint64_t DoubleBitsMinRoundNofitInt32 = 0x41dfffffffe00000; // [~2147483647.5000002] Smallest double that when rounded wont fit into an int32
static const uint64_t DoubleBitsMinRoundFitInt32 = 0xc1e0000000100000; // [-2147483648.5000000] Smallest double that when rounded will fit into an int32
static const uint64_t DoubleBitsMaxRoundNofitInt32 = 0xc1e0000000100001; // [~2147483648.5000005] Largest double that when rounded wont fit into an int32
因此,对于您想要的示例:
if( f >= B2F(FloatbitsMinFitInt32) && f <= B2F(FloatbitsMaxFitInt32))
// cast is valid.
B2F 是这样的:
float B2F(uint32_t bits)
{
static_assert(sizeof(float) == sizeof(uint32_t), "Weird arch");
float f;
memcpy(&f, &bits, sizeof(float));
return f;
}
请注意,此转换会正确选择 nans/inf(因为与它们的比较是错误的)除非您使用的是编译器的非 iee754 模式(例如gcc 上的 ffast-math 或 msvc 上的 /fp:fast)
float
的值超出了 int
范围,这不足为奇。发明浮点值是为了充分表示非常大(也非常小)的值。
INT_MAX + 1
(通常等于2147483648
)不能用int
表示,但可以用float
. 表示
- 是的,
static_cast<int>(float)
和未定义的行为一样不安全。但是,对于足够大的整数x
和y
,像x + y
这样简单的东西也是 UB,所以这里也没什么大惊喜。 做事情的正确方法取决于应用程序,就像在 C++ 中一样。 Boost 有
numeric_cast
会在溢出时抛出异常;这可能对你有好处。要进行饱和(将太大的值转换为INT_MIN
和INT_MAX
),请编写如下代码float f; int i; ... if (static_cast<double>(INT_MIN) <= f && f < static_cast<double>(INT_MAX)) i = static_cast<int>(f); else if (f < 0) i = INT_MIN; else i = INT_MAX;
然而,这并不理想。你的系统有没有可以表示
int
最大值的double
类型?如果是,它将起作用。此外,您希望如何精确舍入接近int
的最小值或最大值的值?如果您不想考虑此类问题,请使用boost::numeric_cast
,如 here. 所述