谁能帮我在 32 位 RISC-V 中从控制台读取 64 位
Could anyone help me to read 64 bit from console in 32 bit RISC-V
我是汇编新手,但谁能教我如何在 32 位 RISC-V 中从控制台读取 64 位?
.eqv SYS_EXITO, 10
.eqv CON_PRTSTR, 4
.eqv CON_PRTINT, 1
.eqv CON_RDINT, 5
.eqv BUFSIZE, 100
.asciz "Read 64 bit integer:"
.asciz "Output:"
.space BUFSIZE
la a0, prompt
la a0, buf
li a1, BUFSIZE
li a7, CON_RDINT
Error in /private/var/folders/bf/t4py6npj0v38grsvrgvq1dx00000gn/T/hsperfdata_sotarosuzuki/riscv1.asm line 24: Runtime exception at 0x00400020: invalid integer input (syscall 5)
是的,如果您不能使用玩具系统调用,请读取一个字符串并对其执行 total = total*10 + digit
,其中 digit = c-'0'
。你需要做 extended-precision 乘法,所以像 (total << 3) + (total << 1)
这样的 extended-precision shifts 可能更容易
检查编译器输出 on Godbolt。例如,GCC 使用移位,clang 使用 mul
(high unsigned) 作为 lo * lo
32x32=>64 位部分积,而 mul
作为 high半叉积 (hi * lo
)。它的指令更少,但取决于 RISC-V CPU 和快速乘法器比 shift/or.
(RISC-V extended-precision加法不方便,因为没有进位标志,需要把carry-out模拟成unsigned sum = a+b;
carry = sum<a;
#include <stdint.h>
uint64_t strtou64(unsigned char*p){
uint64_t total = 0;
unsigned digit = *p - '0'; // peeling the first iteration is usually good in asm
while (digit < 10) { // loop until any non-digit character
total = total*10 + digit;
p++; // *p was checked before the loop or last iteration
digit = *p - '0'; // get a digit ready for the loop branch
return total;
Clang 的输出较短,所以我将展示它。它当然遵循标准的调用约定,取a0
# rv32gc clang 14.0 -O3
mv a2, a0
lbu a0, 0(a0) # load the first char
addi a3, a0, -48 # *p - '0'
li a0, 9
bltu a0, a3, .LBB0_4 # return 0 if the first char is a non-digit
li a0, 0 # total in a1:a0 = 0 ; should have done these before the branch
li a1, 0 # so a separate ret wouldn't be needed
addi a2, a2, 1 # p++
li a6, 10 # multiplier constant
.LBB0_2: # do{
mulhu a5, a0, a6 # high half of (lo(total) * 10)
mul a1, a1, a6 # hi(total) * 10
add a1, a1, a5 # add the high-half partial products
mul a5, a0, a6 # low half of (lo(total) * 10)
lbu a4, 0(a2) # load *p
add a0, a5, a3 # lo(total) = lo(total*10) + digit
sltu a3, a0, a5 # carry-out from that
add a1, a1, a3 # propagate carry into hi(total)
addi a3, a4, -48 # digit = *p - '0'
addi a2, a2, 1 # p++ done after the load; clang peeled one pointer increment before the loop
bltu a3, a6, .LBB0_2 # }while(digit < 10)
li a0, 0 # return 0 special case
li a1, 0 # because clang was dumb and didn't load these regs before branching
如果您想采用 GCC 的 shift/or 策略,应该很容易看到该插槽如何插入到 clang 正在使用的相同逻辑中。您可以查看像 return u64 << 3
顺便说一句,我在编写 C 语言时考虑了编译成合适的 asm,使编译器可以轻松地将其转换为 do{}while
循环,条件位于底部。我在 NASM Assembly convert input to integer?
的回答中基于 x86 asm
我是汇编新手,但谁能教我如何在 32 位 RISC-V 中从控制台读取 64 位?
.eqv SYS_EXITO, 10
.eqv CON_PRTSTR, 4
.eqv CON_PRTINT, 1
.eqv CON_RDINT, 5
.eqv BUFSIZE, 100
.asciz "Read 64 bit integer:"
.asciz "Output:"
.space BUFSIZE
la a0, prompt
la a0, buf
li a1, BUFSIZE
li a7, CON_RDINT
Error in /private/var/folders/bf/t4py6npj0v38grsvrgvq1dx00000gn/T/hsperfdata_sotarosuzuki/riscv1.asm line 24: Runtime exception at 0x00400020: invalid integer input (syscall 5)
那么,我应该将整数作为字符串读取并将其转换为整数吗? 我已经搜索过这个解决方案,但找不到。
是的,如果您不能使用玩具系统调用,请读取一个字符串并对其执行 total = total*10 + digit
,其中 digit = c-'0'
。你需要做 extended-precision 乘法,所以像 (total << 3) + (total << 1)
检查编译器输出 on Godbolt。例如,GCC 使用移位,clang 使用 mul
(high unsigned) 作为 lo * lo
32x32=>64 位部分积,而 mul
作为 high半叉积 (hi * lo
)。它的指令更少,但取决于 RISC-V CPU 和快速乘法器比 shift/or.
(RISC-V extended-precision加法不方便,因为没有进位标志,需要把carry-out模拟成unsigned sum = a+b;
carry = sum<a;
#include <stdint.h>
uint64_t strtou64(unsigned char*p){
uint64_t total = 0;
unsigned digit = *p - '0'; // peeling the first iteration is usually good in asm
while (digit < 10) { // loop until any non-digit character
total = total*10 + digit;
p++; // *p was checked before the loop or last iteration
digit = *p - '0'; // get a digit ready for the loop branch
return total;
Clang 的输出较短,所以我将展示它。它当然遵循标准的调用约定,取a0
# rv32gc clang 14.0 -O3
mv a2, a0
lbu a0, 0(a0) # load the first char
addi a3, a0, -48 # *p - '0'
li a0, 9
bltu a0, a3, .LBB0_4 # return 0 if the first char is a non-digit
li a0, 0 # total in a1:a0 = 0 ; should have done these before the branch
li a1, 0 # so a separate ret wouldn't be needed
addi a2, a2, 1 # p++
li a6, 10 # multiplier constant
.LBB0_2: # do{
mulhu a5, a0, a6 # high half of (lo(total) * 10)
mul a1, a1, a6 # hi(total) * 10
add a1, a1, a5 # add the high-half partial products
mul a5, a0, a6 # low half of (lo(total) * 10)
lbu a4, 0(a2) # load *p
add a0, a5, a3 # lo(total) = lo(total*10) + digit
sltu a3, a0, a5 # carry-out from that
add a1, a1, a3 # propagate carry into hi(total)
addi a3, a4, -48 # digit = *p - '0'
addi a2, a2, 1 # p++ done after the load; clang peeled one pointer increment before the loop
bltu a3, a6, .LBB0_2 # }while(digit < 10)
li a0, 0 # return 0 special case
li a1, 0 # because clang was dumb and didn't load these regs before branching
如果您想采用 GCC 的 shift/or 策略,应该很容易看到该插槽如何插入到 clang 正在使用的相同逻辑中。您可以查看像 return u64 << 3
顺便说一句,我在编写 C 语言时考虑了编译成合适的 asm,使编译器可以轻松地将其转换为 do{}while
循环,条件位于底部。我在 NASM Assembly convert input to integer?