如何在 Verilog 中执行两个 Q15 值的除法,而不使用“/”(除法)运算符?
How to perform division of two Q15 values in Verilog , with out using '/' (division) Operator?
因为除法运算 (/) 在 FPGA 的情况下很昂贵?是否可以用基本的移位操作对两个 Q15 格式数字(16 位定点数)进行除法?
有人可以提供一些例子来帮助我吗?
提前致谢!
定点运算只是整数运算,加上一些缩放比例。Q15 是一种纯小数格式,存储为比例因子为 215[=44= 的带符号 16 位整数], 能够表示区间 [-1, 1) 中的值。显然,只有当除数的大小超过被除数的大小时,除法在 Q15 中才有意义,否则商的大小超过可表示的范围。
在着手定点除法的自定义 Verilog 实现之前,您需要检查 FPGA 供应商的库产品,因为包含流水线除法的定点库通常可用。还有可能相关的开源项目,比如this one.
当使用整数除法运算符进行定点除法时,我们需要调整除法会去除比例因子,即(a * 2scale) / (b * 2scale) = (a/b),而正确的定点结果是(a/b * 2scale).这很容易通过将红利预乘以 2scale 来解决,如下面的 C 实现:
int16_t div_q15 (int16_t dividend, int16_t divisor)
{
return (int16_t)(((int32_t)dividend << 15) / (int32_t)divisor);
}
Wikipedia gives a reasonable overwiew on how to implement binary division on a bit-by-bit basis using add, subtract, and shift operations. These methods are closely related to the longhand division taught in grade school. For FPGAs, the use of the non-restoring method if often preferred, as pointed out by this paper,例如:
尼古拉·索罗金,"Implementation of high-speed fixed-point dividers on FPGA"。 计算机科学与技术杂志,第 2 卷。 6,第 1 期,2006 年 4 月,第 8-11 页。
这里是 C 代码,展示了非恢复方法如何用于 16 位二进制补码操作数的除法:
/* bit-wise non-restoring two's complement division */
void int16_div (int16_t dividend, int16_t divisor, int16_t *quot, int16_t *rem)
{
const int operand_bits = (int) (sizeof (int16_t) * CHAR_BIT);
uint16_t d = (uint16_t)divisor;
uint16_t nd = 0 - d; /* -divisor */
uint16_t r, q = 0; /* remainder, quotient */
uint32_t dd = (uint32_t)d << operand_bits; /* expanded divisor */
uint32_t pp = dividend; /* partial remainder */
int i;
for (i = operand_bits - 1; i >= 0; i--) {
if ((int32_t)(pp ^ dd) < 0) {
q = (q << 1) + 0; /* record quotient bit -1 (as 0) */
pp = (pp << 1) + dd;
} else {
q = (q << 1) + 1; /* record quotient bit +1 (as 1) */
pp = (pp << 1) - dd;
}
}
/* convert quotient from digit set {-1,1} to plain two's complement */
q = (q << 1) + 1;
/* remainder is upper half of partial remainder */
r = (uint16_t)(pp >> operand_bits);
/* fix up cases where we worked past a partial remainder of zero */
if (r == d) { /* remainder equal to divisor */
q = q + 1;
r = 0;
} else if (r == nd) { /* remainder equal to -divisor */
q = q - 1;
r = 0;
}
/* for truncating division, remainder must have same sign as dividend */
if (r && ((int16_t)(dividend ^ r) < 0)) {
if ((int16_t)q < 0) {
q = q + 1;
r = r - d;
} else {
q = q - 1;
r = r + d;
}
}
*quot = (int16_t)q;
*rem = (int16_t)r;
}
请注意,有多种方法可以处理非还原除法中出现的各种特殊情况。例如,在这种情况下,人们经常看到检测到零部分余数 pp
并提前退出商位循环的代码。在这里,我假设 FPGA 实现将完全展开循环以创建流水线实现,在这种情况下,提前终止没有帮助。相反,对那些因忽略零的部分余数而受到影响的商应用最终修正。
为了从上面创建一个 Q15 部门,我们只需要做一个改变:合并股息的扩大规模。而不是:
uint32_t pp = dividend; /* partial remainder */
我们现在使用这个:
uint32_t pp = dividend << 15; /* partial remainder; incorporate Q15 scaling */
生成的 C 代码(抱歉,我不会提供可读的 Verilog 代码)包括测试框架:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <limits.h>
#include <math.h>
/* bit-wise non-restoring two's complement division */
void q15_div (int16_t dividend, int16_t divisor, int16_t *quot, int16_t *rem)
{
const int operand_bits = (int) (sizeof (int16_t) * CHAR_BIT);
uint16_t d = (uint16_t)divisor;
uint16_t nd = 0 - d; /* -divisor */
uint16_t r, q = 0; /* remainder, quotient */
uint32_t dd = (uint32_t)d << operand_bits; /* expanded divisor */
uint32_t pp = dividend << 15; /* partial remainder, incorporate Q15 scaling */
int i;
for (i = operand_bits - 1; i >= 0; i--) {
if ((int32_t)(pp ^ dd) < 0) {
q = (q << 1) + 0; /* record quotient bit -1 (as 0) */
pp = (pp << 1) + dd;
} else {
q = (q << 1) + 1; /* record quotient bit +1 (as 1) */
pp = (pp << 1) - dd;
}
}
/* convert quotient from digit set {-1,1} to plain two's complement */
q = (q << 1) + 1;
/* remainder is upper half of partial remainder */
r = (uint16_t)(pp >> operand_bits);
/* fix up cases where we worked past a partial remainder of zero */
if (r == d) { /* remainder equal to divisor */
q = q + 1;
r = 0;
} else if (r == nd) { /* remainder equal to -divisor */
q = q - 1;
r = 0;
}
/* for truncating division, remainder must have same sign as dividend */
if (r && ((int16_t)(dividend ^ r) < 0)) {
if ((int16_t)q < 0) {
q = q + 1;
r = r - d;
} else {
q = q - 1;
r = r + d;
}
}
*quot = (int16_t)q;
*rem = (int16_t)r;
}
int main (void)
{
uint16_t dividend, divisor, ref_q, res_q, res_r;
double quot, fxscale = (1 << 15);
dividend = 0;
do {
printf ("\r%04x", dividend);
divisor = 1;
do {
quot = trunc (fxscale * (int16_t)dividend / (int16_t)divisor);
/* Q15 can only represent numbers in [-1, 1) */
if ((quot >= -1.0) && (quot < 1.0)) {
ref_q = (int16_t)((((int32_t)(int16_t)dividend) << 15) /
((int32_t)(int16_t)divisor));
q15_div ((int16_t)dividend, (int16_t)divisor,
(int16_t *)&res_q, (int16_t *)&res_r);
if (res_q != ref_q) {
printf ("!r dividend=%04x (%f) divisor=%04x (%f) res=%04x (%f) ref=%04x (%f)\n",
dividend, (int16_t)dividend / fxscale,
divisor, (int16_t)divisor / fxscale,
res_q, (int16_t)res_q / fxscale,
ref_q, (int16_t)ref_q / fxscale);
}
}
divisor++;
} while (divisor);
dividend++;
} while (dividend);
return EXIT_SUCCESS;
}
因为除法运算 (/) 在 FPGA 的情况下很昂贵?是否可以用基本的移位操作对两个 Q15 格式数字(16 位定点数)进行除法?
有人可以提供一些例子来帮助我吗?
提前致谢!
定点运算只是整数运算,加上一些缩放比例。Q15 是一种纯小数格式,存储为比例因子为 215[=44= 的带符号 16 位整数], 能够表示区间 [-1, 1) 中的值。显然,只有当除数的大小超过被除数的大小时,除法在 Q15 中才有意义,否则商的大小超过可表示的范围。
在着手定点除法的自定义 Verilog 实现之前,您需要检查 FPGA 供应商的库产品,因为包含流水线除法的定点库通常可用。还有可能相关的开源项目,比如this one.
当使用整数除法运算符进行定点除法时,我们需要调整除法会去除比例因子,即(a * 2scale) / (b * 2scale) = (a/b),而正确的定点结果是(a/b * 2scale).这很容易通过将红利预乘以 2scale 来解决,如下面的 C 实现:
int16_t div_q15 (int16_t dividend, int16_t divisor)
{
return (int16_t)(((int32_t)dividend << 15) / (int32_t)divisor);
}
Wikipedia gives a reasonable overwiew on how to implement binary division on a bit-by-bit basis using add, subtract, and shift operations. These methods are closely related to the longhand division taught in grade school. For FPGAs, the use of the non-restoring method if often preferred, as pointed out by this paper,例如:
尼古拉·索罗金,"Implementation of high-speed fixed-point dividers on FPGA"。 计算机科学与技术杂志,第 2 卷。 6,第 1 期,2006 年 4 月,第 8-11 页。
这里是 C 代码,展示了非恢复方法如何用于 16 位二进制补码操作数的除法:
/* bit-wise non-restoring two's complement division */
void int16_div (int16_t dividend, int16_t divisor, int16_t *quot, int16_t *rem)
{
const int operand_bits = (int) (sizeof (int16_t) * CHAR_BIT);
uint16_t d = (uint16_t)divisor;
uint16_t nd = 0 - d; /* -divisor */
uint16_t r, q = 0; /* remainder, quotient */
uint32_t dd = (uint32_t)d << operand_bits; /* expanded divisor */
uint32_t pp = dividend; /* partial remainder */
int i;
for (i = operand_bits - 1; i >= 0; i--) {
if ((int32_t)(pp ^ dd) < 0) {
q = (q << 1) + 0; /* record quotient bit -1 (as 0) */
pp = (pp << 1) + dd;
} else {
q = (q << 1) + 1; /* record quotient bit +1 (as 1) */
pp = (pp << 1) - dd;
}
}
/* convert quotient from digit set {-1,1} to plain two's complement */
q = (q << 1) + 1;
/* remainder is upper half of partial remainder */
r = (uint16_t)(pp >> operand_bits);
/* fix up cases where we worked past a partial remainder of zero */
if (r == d) { /* remainder equal to divisor */
q = q + 1;
r = 0;
} else if (r == nd) { /* remainder equal to -divisor */
q = q - 1;
r = 0;
}
/* for truncating division, remainder must have same sign as dividend */
if (r && ((int16_t)(dividend ^ r) < 0)) {
if ((int16_t)q < 0) {
q = q + 1;
r = r - d;
} else {
q = q - 1;
r = r + d;
}
}
*quot = (int16_t)q;
*rem = (int16_t)r;
}
请注意,有多种方法可以处理非还原除法中出现的各种特殊情况。例如,在这种情况下,人们经常看到检测到零部分余数 pp
并提前退出商位循环的代码。在这里,我假设 FPGA 实现将完全展开循环以创建流水线实现,在这种情况下,提前终止没有帮助。相反,对那些因忽略零的部分余数而受到影响的商应用最终修正。
为了从上面创建一个 Q15 部门,我们只需要做一个改变:合并股息的扩大规模。而不是:
uint32_t pp = dividend; /* partial remainder */
我们现在使用这个:
uint32_t pp = dividend << 15; /* partial remainder; incorporate Q15 scaling */
生成的 C 代码(抱歉,我不会提供可读的 Verilog 代码)包括测试框架:
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <limits.h>
#include <math.h>
/* bit-wise non-restoring two's complement division */
void q15_div (int16_t dividend, int16_t divisor, int16_t *quot, int16_t *rem)
{
const int operand_bits = (int) (sizeof (int16_t) * CHAR_BIT);
uint16_t d = (uint16_t)divisor;
uint16_t nd = 0 - d; /* -divisor */
uint16_t r, q = 0; /* remainder, quotient */
uint32_t dd = (uint32_t)d << operand_bits; /* expanded divisor */
uint32_t pp = dividend << 15; /* partial remainder, incorporate Q15 scaling */
int i;
for (i = operand_bits - 1; i >= 0; i--) {
if ((int32_t)(pp ^ dd) < 0) {
q = (q << 1) + 0; /* record quotient bit -1 (as 0) */
pp = (pp << 1) + dd;
} else {
q = (q << 1) + 1; /* record quotient bit +1 (as 1) */
pp = (pp << 1) - dd;
}
}
/* convert quotient from digit set {-1,1} to plain two's complement */
q = (q << 1) + 1;
/* remainder is upper half of partial remainder */
r = (uint16_t)(pp >> operand_bits);
/* fix up cases where we worked past a partial remainder of zero */
if (r == d) { /* remainder equal to divisor */
q = q + 1;
r = 0;
} else if (r == nd) { /* remainder equal to -divisor */
q = q - 1;
r = 0;
}
/* for truncating division, remainder must have same sign as dividend */
if (r && ((int16_t)(dividend ^ r) < 0)) {
if ((int16_t)q < 0) {
q = q + 1;
r = r - d;
} else {
q = q - 1;
r = r + d;
}
}
*quot = (int16_t)q;
*rem = (int16_t)r;
}
int main (void)
{
uint16_t dividend, divisor, ref_q, res_q, res_r;
double quot, fxscale = (1 << 15);
dividend = 0;
do {
printf ("\r%04x", dividend);
divisor = 1;
do {
quot = trunc (fxscale * (int16_t)dividend / (int16_t)divisor);
/* Q15 can only represent numbers in [-1, 1) */
if ((quot >= -1.0) && (quot < 1.0)) {
ref_q = (int16_t)((((int32_t)(int16_t)dividend) << 15) /
((int32_t)(int16_t)divisor));
q15_div ((int16_t)dividend, (int16_t)divisor,
(int16_t *)&res_q, (int16_t *)&res_r);
if (res_q != ref_q) {
printf ("!r dividend=%04x (%f) divisor=%04x (%f) res=%04x (%f) ref=%04x (%f)\n",
dividend, (int16_t)dividend / fxscale,
divisor, (int16_t)divisor / fxscale,
res_q, (int16_t)res_q / fxscale,
ref_q, (int16_t)ref_q / fxscale);
}
}
divisor++;
} while (divisor);
dividend++;
} while (dividend);
return EXIT_SUCCESS;
}