半精度浮点数

Question

我有一个关于半精度 IEEE-754 的小问题。

1) 我有以下练习： 13,7625 应该写成 16 位（半精度）

所以我开始将数字从 DEC 转换为二进制，我得到了这个 13,7625 = 1101.1100001100₂

总而言之，就是 1.1011100001100 * 2³。

符号位为0，因为数字是正数。
尾数应有十位 = 101 110 0001
指数有五位 = bias(15) + 3 = 18 因此指数是 10010，这就是该死的问题。

我的教授给了我们解决方案，据我所知，我的尾数和二进制转换都非常正确，但对于指数，他说它是 19=10011，但我不明白。 bais 可以是 16 岁吗？根据维基百科其 - 15 为半精度。 - 127 为单精度。 - 双精度为 1032。

你能指出我做错了什么吗？

2) 另一个问题是，如果我们遇到以下情况，指数偏差是多少： 1 个符号位 + 4 个尾数位 + 3 个指数位。为什么？

谢谢。

Answer 1

1) I have the following exercise: 13,7625 shall be written in 16 bit (half precision)

so I started to convert the number from DEC to Binary and I got this 13,7625 = 1101.11000011002

您的尾数转换正确，指数也正确。半精度的指数偏差为 15 https://en.wikipedia.org/wiki/Half-precision_floating-point_format

one other question what would be the exponent bias if we have the following situation: 1 sign bit + 4 Mantissa bits + 3 exponent bits. and why?

IEEE-754 FP编码的规则是，如果指数用n位编码，偏置为2^n-1-1。这适用于简单精度(8b/bias 2⁷-1=127)，double(11b/ 2¹⁰-1=1023 bias (而不是 1032，有一个小错别字）），等等
对于 3 位的指数字段，这给出了 2²-1=3

的偏差

对于您的编码问题，这将给出指数代码 3+3=6=110。对于尾数，它取决于舍入策略。如果尾数向 0 舍入，我们可以通过删除尾随位来编码 1.1011(100001100)，最终代码将是
0.110.1011。

但是舍入误差略优于 0.5 ULP（恰好是 0.1000011 ULP），为了最小化它，1.10111000011 应该通过将 ULP 加 1 舍入到 4 位。

  1.1011 
+      1
= 1.1100

最终代码为 0.110.1100

半精度浮点数

Half-precision floating-point

binary

ieee

ieee-754

half-precision-float