使用 arm-none-eabi-gcc 编译时不考虑优化标志
Optimization flag is not taken into account when compiling with arm-none-eabi-gcc
我想使用 libopencm3 项目和 运行 在 ARM Cortex-M4 处理器上使用 arm-none-eabi-gcc 9.2.1
编译程序。我的程序由两个文件组成:main.c
#include "../common/stm32wrapper.h"
#include "test.h"
#include <stdio.h>
#include <string.h>
typedef unsigned char u8;
typedef unsigned int u32;
typedef unsigned long long u64;
int main(void)
{
clock_setup();
gpio_setup();
usart_setup(115200);
flash_setup();
SCS_DEMCR |= SCS_DEMCR_TRCENA;
DWT_CYCCNT = 0;
DWT_CTRL |= DWT_CTRL_CYCCNTENA;
u32 oldcount, newcount;
u32 a = 0x75;
u32 b = 0x14;
char buffer[36];
oldcount = DWT_CYCCNT;
u32 c = test(a,b);
newcount = DWT_CYCCNT-oldcount;
sprintf(buffer, "cycles: %d, %08x", newcount, c);
send_USART_str(buffer);
return 0;
}
和test.c
.
uint32_t test(uint32_t a, uint32_t b) {
uint32_t tmp0, tmp1;
uint32_t c;
for(int i = 0; i< 4096; i++) {
tmp0 = a & 0xff;
tmp1 = b & 0xff;
c = tmp0 ^ tmp1 ^ (a>>(i/512)) ^ (b >> (i/1024));
}
return c;
}
为了编译我的程序,我使用以下 makefile:
.PHONY: all clean
PREFIX ?= arm-none-eabi
CC = $(PREFIX)-gcc -v
LD = $(PREFIX)-gcc -v
OBJCOPY = $(PREFIX)-objcopy
OBJDUMP = $(PREFIX)-objdump
GDB = $(PREFIX)-gdb
OPENCM3DIR = ../libopencm3
ARMNONEEABIDIR = /usr/arm-none-eabi
COMMONDIR = ../common
all: test_m4.bin
test_m4.%: ARCH_FLAGS = -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16
test_m4.o: CFLAGS += -DSTM32F4
$(COMMONDIR)/stm32f4_wrapper.o: CFLAGS += -DSTM32F4
test_m4.elf: LDSCRIPT = $(COMMONDIR)/stm32f4-discovery.ld
test_m4.elf: LDFLAGS += -L$(OPENCM3DIR)/lib/ -lopencm3_stm32f4
test_m4.elf: OBJS += $(COMMONDIR)/stm32f4_wrapper.o
test_m4.elf: $(COMMONDIR)/stm32f4_wrapper.o $(OPENCM3DIR)/lib/libopencm3_stm32f4.a
CFLAGS += -O3 \
-Wall -Wextra -Wimplicit-function-declaration \
-Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes \
-Wundef -Wshadow \
-I$(ARMNONEEABIDIR)/include -I$(OPENCM3DIR)/include \
-fno-common $(ARCH_FLAGS) -MD \
-ftime-report
LDFLAGS += --static -Wl,--start-group -lc -lgcc -lnosys -Wl,--end-group \
-T$(LDSCRIPT) -nostartfiles -Wl,--gc-sections,--no-print-gc-sections \
$(ARCH_FLAGS)
OBJS += test.c
%.bin: %.elf
$(OBJCOPY) -Obinary $^ $@
%.elf: %.o $(OBJS) $(LDSCRIPT)
$(LD) -o $@ $< $(OBJS) $(LDFLAGS)
test%.o: main.c
$(CC) $(CFLAGS) -o $@ -c $^
%.o: %.c
$(CC) $(CFLAGS) -o $@ -c $^
clean:
rm -f *.o *.d *.elf *.bin
我可以使用此生成文件编译和 运行 我的代码。通过 运行ning make
我得到以下输出:
arm-none-eabi-gcc -v -O3 -Wall -Wextra -Wimplicit-function-declaration -Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes -Wundef -Wshadow -I/usr/arm-none-eabi/include -I../libopencm3/include -fno-common -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -MD -ftime-report -DSTM32F4 -o test_m4.o -c main.c
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
Target: arm-none-eabi
Configured with: /mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/src/gcc/configure --target=arm-none-eabi --prefix=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native --libexecdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/lib --infodir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/man --htmldir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/html --pdfdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/arm-none-eabi --build=x86_64-linux-gnu --host=x86_64-linux-gnu --with-gmp=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpfr=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpc=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-isl=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-libelf=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-pkgversion='GNU Tools for Arm Embedded Processors 9-2019-q4-major' --with-multilib-list=rmprofile
Thread model: single
gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major)
COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-Wimplicit-function-declaration' '-Wredundant-decls' '-Wmissing-prototypes' '-Wstrict-prototypes' '-Wundef' '-Wshadow' '-I' '/usr/arm-none-eabi/include' '-I' '../libopencm3/include' '-fno-common' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-MD' '-ftime-report' '-D' 'STM32F4' '-o' 'test_m4.o' '-c' '-march=armv7e-m+fp'
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/cc1 -quiet -v -I /usr/arm-none-eabi/include -I ../libopencm3/include -imultilib thumb/v7e-m+fp/hard -iprefix /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/ -isysroot /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi -MD test_m4.d -MQ test_m4.o -D__USES_INITFINI__ -D STM32F4 main.c -quiet -dumpbase main.c -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -march=armv7e-m+fp -auxbase-strip test_m4.o -O3 -Wall -Wextra -Wimplicit-function-declaration -Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes -Wundef -Wshadow -version -fno-common -ftime-report -o /tmp/ccm5h1i9.s
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/local/include"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include-fixed"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/include"
ignoring nonexistent directory "/usr/arm-none-eabi/include"
#include "..." search starts here:
#include <...> search starts here:
../libopencm3/include
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include-fixed
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include
End of search list.
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 4381e146d4f016ae8e44a645dba65184
Time variable usr sys wall GGC
phase setup : 0.01 ( 8%) 0.01 ( 20%) 0.03 ( 17%) 3569 kB ( 62%)
phase parsing : 0.10 ( 83%) 0.04 ( 80%) 0.14 ( 78%) 2069 kB ( 36%)
phase opt and generate : 0.01 ( 8%) 0.00 ( 0%) 0.01 ( 6%) 120 kB ( 2%)
preprocessing : 0.03 ( 25%) 0.03 ( 60%) 0.03 ( 17%) 889 kB ( 15%)
lexical analysis : 0.04 ( 33%) 0.00 ( 0%) 0.05 ( 28%) 0 kB ( 0%)
parser (global) : 0.02 ( 17%) 0.00 ( 0%) 0.04 ( 22%) 1063 kB ( 18%)
parser struct body : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 6%) 41 kB ( 1%)
parser enumerator list : 0.01 ( 8%) 0.01 ( 20%) 0.01 ( 6%) 54 kB ( 1%)
tree gimplify : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 6%) 8 kB ( 0%)
initialize rtl : 0.01 ( 8%) 0.00 ( 0%) 0.00 ( 0%) 7 kB ( 0%)
TOTAL : 0.12 0.05 0.18 5767 kB
COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-Wimplicit-function-declaration' '-Wredundant-decls' '-Wmissing-prototypes' '-Wstrict-prototypes' '-Wundef' '-Wshadow' '-I' '/usr/arm-none-eabi/include' '-I' '../libopencm3/include' '-fno-common' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-MD' '-ftime-report' '-D' 'STM32F4' '-o' 'test_m4.o' '-c' '-march=armv7e-m+fp'
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/as -v -I /usr/arm-none-eabi/include -I ../libopencm3/include -march=armv7e-m -mfloat-abi=hard -mfpu=fpv4-sp-d16 -meabi=5 -o test_m4.o /tmp/ccm5h1i9.s
GNU assembler version 2.33.1 (arm-none-eabi) using BFD version (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 2.33.1.20191025
COMPILER_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/
LIBRARY_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/
COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-Wimplicit-function-declaration' '-Wredundant-decls' '-Wmissing-prototypes' '-Wstrict-prototypes' '-Wundef' '-Wshadow' '-I' '/usr/arm-none-eabi/include' '-I' '../libopencm3/include' '-fno-common' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-MD' '-ftime-report' '-D' 'STM32F4' '-o' 'test_m4.o' '-c' '-march=armv7e-m+fp'
arm-none-eabi-gcc -v -o test_m4.elf test_m4.o test.c ../common/stm32f4_wrapper.o --static -Wl,--start-group -lc -lgcc -lnosys -Wl,--end-group -T../common/stm32f4-discovery.ld -nostartfiles -Wl,--gc-sections,--no-print-gc-sections -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -L../libopencm3/lib/ -lopencm3_stm32f4
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/lto-wrapper
Target: arm-none-eabi
Configured with: /mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/src/gcc/configure --target=arm-none-eabi --prefix=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native --libexecdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/lib --infodir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/man --htmldir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/html --pdfdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/arm-none-eabi --build=x86_64-linux-gnu --host=x86_64-linux-gnu --with-gmp=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpfr=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpc=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-isl=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-libelf=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-pkgversion='GNU Tools for Arm Embedded Processors 9-2019-q4-major' --with-multilib-list=rmprofile
Thread model: single
gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major)
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/cc1 -quiet -v -imultilib thumb/v7e-m+fp/hard -iprefix /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/ -isysroot /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi -D__USES_INITFINI__ test.c -quiet -dumpbase test.c -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -march=armv7e-m+fp -auxbase test -version -o /tmp/cc3yny6o.s
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/local/include"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include-fixed"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include-fixed
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include
End of search list.
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 4381e146d4f016ae8e44a645dba65184
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/as -v -march=armv7e-m -mfloat-abi=hard -mfpu=fpv4-sp-d16 -meabi=5 -o /tmp/ccfflDpW.o /tmp/cc3yny6o.s
GNU assembler version 2.33.1 (arm-none-eabi) using BFD version (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 2.33.1.20191025
COMPILER_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/
LIBRARY_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/collect2 -plugin /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/liblto_plugin.so -plugin-opt=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/lto-wrapper -plugin-opt=-fresolution=/tmp/cc4qN1Kt.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc --sysroot=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi -Bstatic -X -o test_m4.elf -L../libopencm3/lib/ -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v7e-m+fp/hard -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/thumb/v7e-m+fp/hard -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1 -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib test_m4.o /tmp/ccfflDpW.o ../common/stm32f4_wrapper.o --start-group -lc -lgcc -lnosys --end-group --gc-sections --no-print-gc-sections -lopencm3_stm32f4 --start-group -lgcc -lc --end-group -T ../common/stm32f4-discovery.ld
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
arm-none-eabi-objcopy -Obinary test_m4.elf test_m4.bin
似乎没有考虑优化标志,因为无论我输入什么,生成的二进制文件总是相同的,程序总是打印 cycles: 196645, 00000063
。通过反汇编二进制文件,我得到了 -Os 和 -O3 优化的以下输出:
080001ac <main>:
80001ac: b570 push {r4, r5, r6, lr}
80001ae: b08a sub sp, #40 ; 0x28
80001b0: f006 fc06 bl 80069c0 <clock_setup>
80001b4: f006 fc1c bl 80069f0 <gpio_setup>
80001b8: f44f 30e1 mov.w r0, #115200 ; 0x1c200
80001bc: f006 fc32 bl 8006a24 <usart_setup>
80001c0: f006 fc52 bl 8006a68 <flash_setup>
80001c4: 490e ldr r1, [pc, #56] ; (8000200 <main+0x54>)
80001c6: 4c0f ldr r4, [pc, #60] ; (8000204 <main+0x58>)
80001c8: 680b ldr r3, [r1, #0]
80001ca: 4a0f ldr r2, [pc, #60] ; (8000208 <main+0x5c>)
80001cc: 2500 movs r5, #0
80001ce: f043 7380 orr.w r3, r3, #16777216 ; 0x1000000
80001d2: 600b str r3, [r1, #0]
80001d4: 6025 str r5, [r4, #0]
80001d6: 6813 ldr r3, [r2, #0]
80001d8: f043 0301 orr.w r3, r3, #1
80001dc: 6013 str r3, [r2, #0]
80001de: 6826 ldr r6, [r4, #0]
80001e0: f000 f816 bl 8000210 <test>
80001e4: 6822 ldr r2, [r4, #0]
80001e6: 4909 ldr r1, [pc, #36] ; (800020c <main+0x60>)
80001e8: 4603 mov r3, r0
80001ea: 1b92 subs r2, r2, r6
80001ec: a801 add r0, sp, #4
80001ee: f006 fca5 bl 8006b3c <sprintf>
80001f2: a801 add r0, sp, #4
80001f4: f006 fc48 bl 8006a88 <send_USART_str>
80001f8: 4628 mov r0, r5
80001fa: b00a add sp, #40 ; 0x28
80001fc: bd70 pop {r4, r5, r6, pc}
80001fe: bf00 nop
8000200: e000edfc .word 0xe000edfc
8000204: e0001004 .word 0xe0001004
8000208: e0001000 .word 0xe0001000
800020c: 0800c1e8 .word 0x0800c1e8
08000210 <test>:
8000210: b480 push {r7}
8000212: b087 sub sp, #28
8000214: af00 add r7, sp, #0
8000216: 2375 movs r3, #117 ; 0x75
8000218: 60fb str r3, [r7, #12]
800021a: 2314 movs r3, #20
800021c: 60bb str r3, [r7, #8]
800021e: 2300 movs r3, #0
8000220: 613b str r3, [r7, #16]
8000222: e020 b.n 8000266 <test+0x56>
8000224: 68fb ldr r3, [r7, #12]
8000226: b2db uxtb r3, r3
8000228: 607b str r3, [r7, #4]
800022a: 68bb ldr r3, [r7, #8]
800022c: b2db uxtb r3, r3
800022e: 603b str r3, [r7, #0]
8000230: 687a ldr r2, [r7, #4]
8000232: 683b ldr r3, [r7, #0]
8000234: 405a eors r2, r3
8000236: 693b ldr r3, [r7, #16]
8000238: 2b00 cmp r3, #0
800023a: da01 bge.n 8000240 <test+0x30>
800023c: f203 13ff addw r3, r3, #511 ; 0x1ff
8000240: 125b asrs r3, r3, #9
8000242: 4619 mov r1, r3
8000244: 68fb ldr r3, [r7, #12]
8000246: 40cb lsrs r3, r1
8000248: 405a eors r2, r3
800024a: 693b ldr r3, [r7, #16]
800024c: 2b00 cmp r3, #0
800024e: da01 bge.n 8000254 <test+0x44>
8000250: f203 33ff addw r3, r3, #1023 ; 0x3ff
8000254: 129b asrs r3, r3, #10
8000256: 4619 mov r1, r3
8000258: 68bb ldr r3, [r7, #8]
800025a: 40cb lsrs r3, r1
800025c: 4053 eors r3, r2
800025e: 617b str r3, [r7, #20]
8000260: 693b ldr r3, [r7, #16]
8000262: 3301 adds r3, #1
8000264: 613b str r3, [r7, #16]
8000266: 693b ldr r3, [r7, #16]
8000268: f5b3 5f80 cmp.w r3, #4096 ; 0x1000
800026c: dbda blt.n 8000224 <test+0x14>
800026e: 697b ldr r3, [r7, #20]
8000270: 4618 mov r0, r3
8000272: 371c adds r7, #28
8000274: 46bd mov sp, r7
8000276: f85d 7b04 ldr.w r7, [sp], #4
800027a: 4770 bx lr
我觉得很奇怪,因为代码在速度方面可以明显提高。例如,可以计算单个 uxtb
而不是两个(如果在 eor
之后执行),所以我认为这里有问题。为什么这里没有考虑优化标志?我的 makefile 有问题吗?
typedef unsigned int uint32_t;
uint32_t test(uint32_t a, uint32_t b) {
uint32_t tmp0, tmp1;
uint32_t c;
for(int i = 0; i< 4096; i++) {
tmp0 = a & 0xff;
tmp1 = b & 0xff;
c = tmp0 ^ tmp1 ^ (a>>(i/512)) ^ (b >> (i/1024));
}
return c;
}
unsigned int hello ( void )
{
return(test(0x75,0x14));
}
9.3.0 和 9.2.1 不会有太大区别,如果你想看,我可以专门得到一个 9.2.1,但你可以自己看。
arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-O0
arm-none-eabi-gcc -O0 so.c -c -mthumb -mcpu=cortex-m4 -o so.o
Disassembly of section .text:
00000000 <test>:
0: b480 push {r7}
2: b087 sub sp, #28
4: af00 add r7, sp, #0
6: 6078 str r0, [r7, #4]
8: 6039 str r1, [r7, #0]
a: 2300 movs r3, #0
c: 613b str r3, [r7, #16]
e: e020 b.n 52 <test+0x52>
10: 687b ldr r3, [r7, #4]
12: b2db uxtb r3, r3
14: 60fb str r3, [r7, #12]
16: 683b ldr r3, [r7, #0]
18: b2db uxtb r3, r3
1a: 60bb str r3, [r7, #8]
1c: 68fa ldr r2, [r7, #12]
1e: 68bb ldr r3, [r7, #8]
20: 405a eors r2, r3
22: 693b ldr r3, [r7, #16]
24: 2b00 cmp r3, #0
26: da01 bge.n 2c <test+0x2c>
28: f203 13ff addw r3, r3, #511 ; 0x1ff
2c: 125b asrs r3, r3, #9
2e: 4619 mov r1, r3
30: 687b ldr r3, [r7, #4]
32: 40cb lsrs r3, r1
34: 405a eors r2, r3
36: 693b ldr r3, [r7, #16]
38: 2b00 cmp r3, #0
3a: da01 bge.n 40 <test+0x40>
3c: f203 33ff addw r3, r3, #1023 ; 0x3ff
40: 129b asrs r3, r3, #10
42: 4619 mov r1, r3
44: 683b ldr r3, [r7, #0]
46: 40cb lsrs r3, r1
48: 4053 eors r3, r2
4a: 617b str r3, [r7, #20]
4c: 693b ldr r3, [r7, #16]
4e: 3301 adds r3, #1
50: 613b str r3, [r7, #16]
52: 693b ldr r3, [r7, #16]
54: f5b3 5f80 cmp.w r3, #4096 ; 0x1000
58: dbda blt.n 10 <test+0x10>
5a: 697b ldr r3, [r7, #20]
5c: 4618 mov r0, r3
5e: 371c adds r7, #28
60: 46bd mov sp, r7
62: bc80 pop {r7}
64: 4770 bx lr
00000066 <hello>:
66: b580 push {r7, lr}
68: af00 add r7, sp, #0
6a: 2114 movs r1, #20
6c: 2075 movs r0, #117 ; 0x75
6e: f7ff fffe bl 0 <test>
72: 4603 mov r3, r0
74: 4618 mov r0, r3
76: bd80 pop {r7, pc}
-O1
arm-none-eabi-gcc -O1 so.c -c -mthumb -mcpu=cortex-m4 -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <test>:
0: f44f 5380 mov.w r3, #4096 ; 0x1000
4: 3b01 subs r3, #1
6: d1fd bne.n 4 <test+0x4>
8: 08ca lsrs r2, r1, #3
a: ea82 12d0 eor.w r2, r2, r0, lsr #7
e: ea80 0301 eor.w r3, r0, r1
12: b2db uxtb r3, r3
14: ea82 0003 eor.w r0, r2, r3
18: 4770 bx lr
0000001a <hello>:
1a: b508 push {r3, lr}
1c: 2114 movs r1, #20
1e: 2075 movs r0, #117 ; 0x75
20: f7ff fffe bl 0 <test>
24: bd08 pop {r3, pc}
-O2
Disassembly of section .text:
00000000 <test>:
0: ea80 0301 eor.w r3, r0, r1
4: 08ca lsrs r2, r1, #3
6: ea82 10d0 eor.w r0, r2, r0, lsr #7
a: b2db uxtb r3, r3
c: 4058 eors r0, r3
e: 4770 bx lr
00000010 <hello>:
10: 2063 movs r0, #99 ; 0x63
12: 4770 bx lr
-O3
00000000 <test>:
0: ea80 0301 eor.w r3, r0, r1
4: 08ca lsrs r2, r1, #3
6: ea82 10d0 eor.w r0, r2, r0, lsr #7
a: b2db uxtb r3, r3
c: 4058 eors r0, r3
e: 4770 bx lr
00000010 <hello>:
10: 2063 movs r0, #99 ; 0x63
12: 4770 bx lr
-Os
00000000 <test>:
0: 08cb lsrs r3, r1, #3
2: ea83 13d0 eor.w r3, r3, r0, lsr #7
6: 4048 eors r0, r1
8: b2c0 uxtb r0, r0
a: 4058 eors r0, r3
c: 4770 bx lr
0000000e <hello>:
e: 2114 movs r1, #20
10: 2075 movs r0, #117 ; 0x75
12: f7ff bffe b.w 0 <test>
如果所有这些都在相同的时间内执行,那么很明显是的,你要么有构建问题,要么有测试问题。如果你声称 -O1 和 -O2 和 -O3 等都产生相同的输出,那么你实际上并没有使用那些优化级别。
没有理由假设 -Os 产生比 -O2 或 -O3 更小的二进制文件。只是您在暗示这种愿望。您可以创建例外。
也没有理由假设为大小编译会执行得更快,-O3 等也不会。尤其是在这样的平台上(以及所有现代平台),其中某些百分比的性能与数字或指令序列,而是整个系统。
您使用的是 stm32、cortex-m4,因此您拥有无法关闭的 st 闪存缓存,现在这将有助于所有测试,但也会隐藏一些东西。你有一个时钟初始化,然后是一个闪光灯设置,想知道如果你正在提高你的时钟那里发生了什么,那么你必须先减慢闪光灯速度,而不是之后,否则你可能会崩溃。对于这样的测试,通常没有理由增加时钟,您希望在理想情况下以定时器时钟周期测量系统(如 cpu)时钟周期,然后在较慢的时钟速度(有些部分是全范围的,但)你可以使用最小的闪存等待状态,然后简单地提高等待状态以进行不同的测试,而无需提高时钟以查看闪存如何影响它不幸的是这是一个 stm32。要解决这个问题,您可以 运行 在 sram 中进行测试。
根据内核的编译时间选项,一些内核具有不同的提取功能和其他功能,您可能有一些核心功能,您可能会搞砸像这样的紧密循环的简单对齐更改可能会产生巨大的影响, 相同的机器代码从不同的地址开始,它在获取行和缓存行中的排列方式会影响基准测试结果。
请注意,使用调试器计时器所需的 systick 计时器可以获得相同的结果。可以将时间收集包装在被测代码中(不是在函数中,但是当您提升汇编语言以制作被测代码时,您可以在之前和之后添加时间收集,而不会产生本身可能会有所不同的函数调用开销从测试到测试。
如果您看到编译器针对不同的设置生成了相同的机器代码,那么您实际上并不是在使用这些设置进行构建,实际上并不是在重新构建应用程序,或者是其他形式的用户错误(在这里构建和使用那里的二进制文件)。结果,在这种情况下,理想情况下,相同的二进制文件将给出相同的时间加上或减去时钟。但这还取决于您 运行 或重新 运行 测试的方式。想不想看缓存效果,先填充缓存再运行测试等
如果您开始看到不同的机器代码,或者如果您确实看到了不同的机器代码但得到的时间相同,那么错误就出在时间测量上,这是基准测试中经常被忽视的问题。只要您真的看到了那个计时器,您的方法似乎就没问题,并且已经完成测试以查看计时器是否正在计数并且朝着您期望的方向发展。如果这是一些未执行时间的指令计数器,那么您仍然可以测试它以查看它是否按照您的想法进行。我对那些调试工具没有用处,所以不要涉足它们,也不要像我对这些系统的其他了解那样了解它们。
作为 m4,您可能还可以使用其他功能 on/off 以查看基于生成的代码、分支预测、缓存、类似 mmu 的东西等方面的性能差异。
这可能是您使用的标志的顺序(每个标志都是第一个问题的原因)相对于 -O3,有些可能会否定其他优化功能。
很想知道真正的目标是什么。明白基准测试是无稽之谈,因为它们很容易操作,相同的高级代码由于各种原因预计不会使用相同或不同的工具在相同的目标中产生相同的结果。降低命令行并尝试 clang/llvm vs gnu 或尝试 gcc 4.x.x、5.x.x 等。在 4.x.x 输出开始变得臃肿之后,编译器并没有做得那么好,对于这样的事情,虽然它们应该非常接近但同时少了或多了一条指令,一个简单的对齐差异可能会使两个测试的结果大相径庭。
然后当你放回改变工作方式的时钟设置时,你可以说不使用等待状态(闪光灯可能 运行s 具有 CPU 速率,所以有一个内置等待)高达 25mhz 作为示例,然后添加一个等待状态高达 50 等等。因设计而异,一些较新的部件闪存可以 运行 比旧部件快得多,但在 25mhz 与 8 频率下,相同数量的时钟是总体上更小的时间,挂钟时间。在边界处,如果您 create/modify 时钟初始化代码并获得性能提升,则可以说您不会增加等待状态,但在该边界上,您会因闪存等待状态增加而受到性能影响。所以这里有一个性能平衡。
总结
如果相同的代码从编译器中出来,那么它就是您的命令行,您可以轻松地简化命令行以查看这些工具将生成不同的代码。如果您的比较是错误的并且代码不同,那么问题是您如何对代码计时,这通常是基准测试出错的地方,以及与编译器命令行无关的其他因素。基准通常是无意义的,因为它们可以被操纵以显示不同的结果(即使不更改测试的高级源代码)。
尝试简化命令行,检查那里的每个选项,并证明为什么它适用于您的特定应用程序。尽可能地验证定时器或指令计数器(并了解执行的指令计数与性能没有直接关系,您可以拥有比其他解决方案执行速度快 100 倍的指令)。
没有理由期望 -Os 会产生更小的代码,人们希望如此,但也有例外。同样,-Os 可能比 -O2 或 -O3 执行得更快,没有理由期望更大的优化级别会产生“更快”的代码。
您正在使用 -O0 标志编译代码。
这里看得很清楚:
https://godbolt.org/z/qZPYqJ
所以编译器永远是对的。未发现遗漏的优化。
好吧,真正的答案并不容易,但在反汇编某些东西之前,应该了解优化实际上是什么以及编译器如何实现其目标。
考虑到 gcc,Os 和 03 之间几乎没有区别,因为它们打开几乎相同的内部标志,除了 Os.
的循环展开
此外,如今 cpu 将所有内容都放在缓存中会更快。
我想使用 libopencm3 项目和 运行 在 ARM Cortex-M4 处理器上使用 arm-none-eabi-gcc 9.2.1
编译程序。我的程序由两个文件组成:main.c
#include "../common/stm32wrapper.h"
#include "test.h"
#include <stdio.h>
#include <string.h>
typedef unsigned char u8;
typedef unsigned int u32;
typedef unsigned long long u64;
int main(void)
{
clock_setup();
gpio_setup();
usart_setup(115200);
flash_setup();
SCS_DEMCR |= SCS_DEMCR_TRCENA;
DWT_CYCCNT = 0;
DWT_CTRL |= DWT_CTRL_CYCCNTENA;
u32 oldcount, newcount;
u32 a = 0x75;
u32 b = 0x14;
char buffer[36];
oldcount = DWT_CYCCNT;
u32 c = test(a,b);
newcount = DWT_CYCCNT-oldcount;
sprintf(buffer, "cycles: %d, %08x", newcount, c);
send_USART_str(buffer);
return 0;
}
和test.c
.
uint32_t test(uint32_t a, uint32_t b) {
uint32_t tmp0, tmp1;
uint32_t c;
for(int i = 0; i< 4096; i++) {
tmp0 = a & 0xff;
tmp1 = b & 0xff;
c = tmp0 ^ tmp1 ^ (a>>(i/512)) ^ (b >> (i/1024));
}
return c;
}
为了编译我的程序,我使用以下 makefile:
.PHONY: all clean
PREFIX ?= arm-none-eabi
CC = $(PREFIX)-gcc -v
LD = $(PREFIX)-gcc -v
OBJCOPY = $(PREFIX)-objcopy
OBJDUMP = $(PREFIX)-objdump
GDB = $(PREFIX)-gdb
OPENCM3DIR = ../libopencm3
ARMNONEEABIDIR = /usr/arm-none-eabi
COMMONDIR = ../common
all: test_m4.bin
test_m4.%: ARCH_FLAGS = -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16
test_m4.o: CFLAGS += -DSTM32F4
$(COMMONDIR)/stm32f4_wrapper.o: CFLAGS += -DSTM32F4
test_m4.elf: LDSCRIPT = $(COMMONDIR)/stm32f4-discovery.ld
test_m4.elf: LDFLAGS += -L$(OPENCM3DIR)/lib/ -lopencm3_stm32f4
test_m4.elf: OBJS += $(COMMONDIR)/stm32f4_wrapper.o
test_m4.elf: $(COMMONDIR)/stm32f4_wrapper.o $(OPENCM3DIR)/lib/libopencm3_stm32f4.a
CFLAGS += -O3 \
-Wall -Wextra -Wimplicit-function-declaration \
-Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes \
-Wundef -Wshadow \
-I$(ARMNONEEABIDIR)/include -I$(OPENCM3DIR)/include \
-fno-common $(ARCH_FLAGS) -MD \
-ftime-report
LDFLAGS += --static -Wl,--start-group -lc -lgcc -lnosys -Wl,--end-group \
-T$(LDSCRIPT) -nostartfiles -Wl,--gc-sections,--no-print-gc-sections \
$(ARCH_FLAGS)
OBJS += test.c
%.bin: %.elf
$(OBJCOPY) -Obinary $^ $@
%.elf: %.o $(OBJS) $(LDSCRIPT)
$(LD) -o $@ $< $(OBJS) $(LDFLAGS)
test%.o: main.c
$(CC) $(CFLAGS) -o $@ -c $^
%.o: %.c
$(CC) $(CFLAGS) -o $@ -c $^
clean:
rm -f *.o *.d *.elf *.bin
我可以使用此生成文件编译和 运行 我的代码。通过 运行ning make
我得到以下输出:
arm-none-eabi-gcc -v -O3 -Wall -Wextra -Wimplicit-function-declaration -Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes -Wundef -Wshadow -I/usr/arm-none-eabi/include -I../libopencm3/include -fno-common -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -MD -ftime-report -DSTM32F4 -o test_m4.o -c main.c
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
Target: arm-none-eabi
Configured with: /mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/src/gcc/configure --target=arm-none-eabi --prefix=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native --libexecdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/lib --infodir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/man --htmldir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/html --pdfdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/arm-none-eabi --build=x86_64-linux-gnu --host=x86_64-linux-gnu --with-gmp=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpfr=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpc=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-isl=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-libelf=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-pkgversion='GNU Tools for Arm Embedded Processors 9-2019-q4-major' --with-multilib-list=rmprofile
Thread model: single
gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major)
COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-Wimplicit-function-declaration' '-Wredundant-decls' '-Wmissing-prototypes' '-Wstrict-prototypes' '-Wundef' '-Wshadow' '-I' '/usr/arm-none-eabi/include' '-I' '../libopencm3/include' '-fno-common' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-MD' '-ftime-report' '-D' 'STM32F4' '-o' 'test_m4.o' '-c' '-march=armv7e-m+fp'
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/cc1 -quiet -v -I /usr/arm-none-eabi/include -I ../libopencm3/include -imultilib thumb/v7e-m+fp/hard -iprefix /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/ -isysroot /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi -MD test_m4.d -MQ test_m4.o -D__USES_INITFINI__ -D STM32F4 main.c -quiet -dumpbase main.c -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -march=armv7e-m+fp -auxbase-strip test_m4.o -O3 -Wall -Wextra -Wimplicit-function-declaration -Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes -Wundef -Wshadow -version -fno-common -ftime-report -o /tmp/ccm5h1i9.s
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/local/include"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include-fixed"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/include"
ignoring nonexistent directory "/usr/arm-none-eabi/include"
#include "..." search starts here:
#include <...> search starts here:
../libopencm3/include
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include-fixed
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include
End of search list.
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 4381e146d4f016ae8e44a645dba65184
Time variable usr sys wall GGC
phase setup : 0.01 ( 8%) 0.01 ( 20%) 0.03 ( 17%) 3569 kB ( 62%)
phase parsing : 0.10 ( 83%) 0.04 ( 80%) 0.14 ( 78%) 2069 kB ( 36%)
phase opt and generate : 0.01 ( 8%) 0.00 ( 0%) 0.01 ( 6%) 120 kB ( 2%)
preprocessing : 0.03 ( 25%) 0.03 ( 60%) 0.03 ( 17%) 889 kB ( 15%)
lexical analysis : 0.04 ( 33%) 0.00 ( 0%) 0.05 ( 28%) 0 kB ( 0%)
parser (global) : 0.02 ( 17%) 0.00 ( 0%) 0.04 ( 22%) 1063 kB ( 18%)
parser struct body : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 6%) 41 kB ( 1%)
parser enumerator list : 0.01 ( 8%) 0.01 ( 20%) 0.01 ( 6%) 54 kB ( 1%)
tree gimplify : 0.00 ( 0%) 0.00 ( 0%) 0.01 ( 6%) 8 kB ( 0%)
initialize rtl : 0.01 ( 8%) 0.00 ( 0%) 0.00 ( 0%) 7 kB ( 0%)
TOTAL : 0.12 0.05 0.18 5767 kB
COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-Wimplicit-function-declaration' '-Wredundant-decls' '-Wmissing-prototypes' '-Wstrict-prototypes' '-Wundef' '-Wshadow' '-I' '/usr/arm-none-eabi/include' '-I' '../libopencm3/include' '-fno-common' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-MD' '-ftime-report' '-D' 'STM32F4' '-o' 'test_m4.o' '-c' '-march=armv7e-m+fp'
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/as -v -I /usr/arm-none-eabi/include -I ../libopencm3/include -march=armv7e-m -mfloat-abi=hard -mfpu=fpv4-sp-d16 -meabi=5 -o test_m4.o /tmp/ccm5h1i9.s
GNU assembler version 2.33.1 (arm-none-eabi) using BFD version (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 2.33.1.20191025
COMPILER_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/
LIBRARY_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/
COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-Wimplicit-function-declaration' '-Wredundant-decls' '-Wmissing-prototypes' '-Wstrict-prototypes' '-Wundef' '-Wshadow' '-I' '/usr/arm-none-eabi/include' '-I' '../libopencm3/include' '-fno-common' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-MD' '-ftime-report' '-D' 'STM32F4' '-o' 'test_m4.o' '-c' '-march=armv7e-m+fp'
arm-none-eabi-gcc -v -o test_m4.elf test_m4.o test.c ../common/stm32f4_wrapper.o --static -Wl,--start-group -lc -lgcc -lnosys -Wl,--end-group -T../common/stm32f4-discovery.ld -nostartfiles -Wl,--gc-sections,--no-print-gc-sections -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -L../libopencm3/lib/ -lopencm3_stm32f4
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/lto-wrapper
Target: arm-none-eabi
Configured with: /mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/src/gcc/configure --target=arm-none-eabi --prefix=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native --libexecdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/lib --infodir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/man --htmldir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/html --pdfdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/arm-none-eabi --build=x86_64-linux-gnu --host=x86_64-linux-gnu --with-gmp=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpfr=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpc=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-isl=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-libelf=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-pkgversion='GNU Tools for Arm Embedded Processors 9-2019-q4-major' --with-multilib-list=rmprofile
Thread model: single
gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major)
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/cc1 -quiet -v -imultilib thumb/v7e-m+fp/hard -iprefix /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/ -isysroot /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi -D__USES_INITFINI__ test.c -quiet -dumpbase test.c -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -march=armv7e-m+fp -auxbase test -version -o /tmp/cc3yny6o.s
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/local/include"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include-fixed"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/include"
#include "..." search starts here:
#include <...> search starts here:
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include-fixed
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include
End of search list.
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP
GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 4381e146d4f016ae8e44a645dba65184
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/as -v -march=armv7e-m -mfloat-abi=hard -mfpu=fpv4-sp-d16 -meabi=5 -o /tmp/ccfflDpW.o /tmp/cc3yny6o.s
GNU assembler version 2.33.1 (arm-none-eabi) using BFD version (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 2.33.1.20191025
COMPILER_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/
LIBRARY_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/collect2 -plugin /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/liblto_plugin.so -plugin-opt=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/lto-wrapper -plugin-opt=-fresolution=/tmp/cc4qN1Kt.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc --sysroot=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi -Bstatic -X -o test_m4.elf -L../libopencm3/lib/ -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v7e-m+fp/hard -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/thumb/v7e-m+fp/hard -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1 -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib test_m4.o /tmp/ccfflDpW.o ../common/stm32f4_wrapper.o --start-group -lc -lgcc -lnosys --end-group --gc-sections --no-print-gc-sections -lopencm3_stm32f4 --start-group -lgcc -lc --end-group -T ../common/stm32f4-discovery.ld
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
arm-none-eabi-objcopy -Obinary test_m4.elf test_m4.bin
似乎没有考虑优化标志,因为无论我输入什么,生成的二进制文件总是相同的,程序总是打印 cycles: 196645, 00000063
。通过反汇编二进制文件,我得到了 -Os 和 -O3 优化的以下输出:
080001ac <main>:
80001ac: b570 push {r4, r5, r6, lr}
80001ae: b08a sub sp, #40 ; 0x28
80001b0: f006 fc06 bl 80069c0 <clock_setup>
80001b4: f006 fc1c bl 80069f0 <gpio_setup>
80001b8: f44f 30e1 mov.w r0, #115200 ; 0x1c200
80001bc: f006 fc32 bl 8006a24 <usart_setup>
80001c0: f006 fc52 bl 8006a68 <flash_setup>
80001c4: 490e ldr r1, [pc, #56] ; (8000200 <main+0x54>)
80001c6: 4c0f ldr r4, [pc, #60] ; (8000204 <main+0x58>)
80001c8: 680b ldr r3, [r1, #0]
80001ca: 4a0f ldr r2, [pc, #60] ; (8000208 <main+0x5c>)
80001cc: 2500 movs r5, #0
80001ce: f043 7380 orr.w r3, r3, #16777216 ; 0x1000000
80001d2: 600b str r3, [r1, #0]
80001d4: 6025 str r5, [r4, #0]
80001d6: 6813 ldr r3, [r2, #0]
80001d8: f043 0301 orr.w r3, r3, #1
80001dc: 6013 str r3, [r2, #0]
80001de: 6826 ldr r6, [r4, #0]
80001e0: f000 f816 bl 8000210 <test>
80001e4: 6822 ldr r2, [r4, #0]
80001e6: 4909 ldr r1, [pc, #36] ; (800020c <main+0x60>)
80001e8: 4603 mov r3, r0
80001ea: 1b92 subs r2, r2, r6
80001ec: a801 add r0, sp, #4
80001ee: f006 fca5 bl 8006b3c <sprintf>
80001f2: a801 add r0, sp, #4
80001f4: f006 fc48 bl 8006a88 <send_USART_str>
80001f8: 4628 mov r0, r5
80001fa: b00a add sp, #40 ; 0x28
80001fc: bd70 pop {r4, r5, r6, pc}
80001fe: bf00 nop
8000200: e000edfc .word 0xe000edfc
8000204: e0001004 .word 0xe0001004
8000208: e0001000 .word 0xe0001000
800020c: 0800c1e8 .word 0x0800c1e8
08000210 <test>:
8000210: b480 push {r7}
8000212: b087 sub sp, #28
8000214: af00 add r7, sp, #0
8000216: 2375 movs r3, #117 ; 0x75
8000218: 60fb str r3, [r7, #12]
800021a: 2314 movs r3, #20
800021c: 60bb str r3, [r7, #8]
800021e: 2300 movs r3, #0
8000220: 613b str r3, [r7, #16]
8000222: e020 b.n 8000266 <test+0x56>
8000224: 68fb ldr r3, [r7, #12]
8000226: b2db uxtb r3, r3
8000228: 607b str r3, [r7, #4]
800022a: 68bb ldr r3, [r7, #8]
800022c: b2db uxtb r3, r3
800022e: 603b str r3, [r7, #0]
8000230: 687a ldr r2, [r7, #4]
8000232: 683b ldr r3, [r7, #0]
8000234: 405a eors r2, r3
8000236: 693b ldr r3, [r7, #16]
8000238: 2b00 cmp r3, #0
800023a: da01 bge.n 8000240 <test+0x30>
800023c: f203 13ff addw r3, r3, #511 ; 0x1ff
8000240: 125b asrs r3, r3, #9
8000242: 4619 mov r1, r3
8000244: 68fb ldr r3, [r7, #12]
8000246: 40cb lsrs r3, r1
8000248: 405a eors r2, r3
800024a: 693b ldr r3, [r7, #16]
800024c: 2b00 cmp r3, #0
800024e: da01 bge.n 8000254 <test+0x44>
8000250: f203 33ff addw r3, r3, #1023 ; 0x3ff
8000254: 129b asrs r3, r3, #10
8000256: 4619 mov r1, r3
8000258: 68bb ldr r3, [r7, #8]
800025a: 40cb lsrs r3, r1
800025c: 4053 eors r3, r2
800025e: 617b str r3, [r7, #20]
8000260: 693b ldr r3, [r7, #16]
8000262: 3301 adds r3, #1
8000264: 613b str r3, [r7, #16]
8000266: 693b ldr r3, [r7, #16]
8000268: f5b3 5f80 cmp.w r3, #4096 ; 0x1000
800026c: dbda blt.n 8000224 <test+0x14>
800026e: 697b ldr r3, [r7, #20]
8000270: 4618 mov r0, r3
8000272: 371c adds r7, #28
8000274: 46bd mov sp, r7
8000276: f85d 7b04 ldr.w r7, [sp], #4
800027a: 4770 bx lr
我觉得很奇怪,因为代码在速度方面可以明显提高。例如,可以计算单个 uxtb
而不是两个(如果在 eor
之后执行),所以我认为这里有问题。为什么这里没有考虑优化标志?我的 makefile 有问题吗?
typedef unsigned int uint32_t;
uint32_t test(uint32_t a, uint32_t b) {
uint32_t tmp0, tmp1;
uint32_t c;
for(int i = 0; i< 4096; i++) {
tmp0 = a & 0xff;
tmp1 = b & 0xff;
c = tmp0 ^ tmp1 ^ (a>>(i/512)) ^ (b >> (i/1024));
}
return c;
}
unsigned int hello ( void )
{
return(test(0x75,0x14));
}
9.3.0 和 9.2.1 不会有太大区别,如果你想看,我可以专门得到一个 9.2.1,但你可以自己看。
arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
-O0
arm-none-eabi-gcc -O0 so.c -c -mthumb -mcpu=cortex-m4 -o so.o
Disassembly of section .text:
00000000 <test>:
0: b480 push {r7}
2: b087 sub sp, #28
4: af00 add r7, sp, #0
6: 6078 str r0, [r7, #4]
8: 6039 str r1, [r7, #0]
a: 2300 movs r3, #0
c: 613b str r3, [r7, #16]
e: e020 b.n 52 <test+0x52>
10: 687b ldr r3, [r7, #4]
12: b2db uxtb r3, r3
14: 60fb str r3, [r7, #12]
16: 683b ldr r3, [r7, #0]
18: b2db uxtb r3, r3
1a: 60bb str r3, [r7, #8]
1c: 68fa ldr r2, [r7, #12]
1e: 68bb ldr r3, [r7, #8]
20: 405a eors r2, r3
22: 693b ldr r3, [r7, #16]
24: 2b00 cmp r3, #0
26: da01 bge.n 2c <test+0x2c>
28: f203 13ff addw r3, r3, #511 ; 0x1ff
2c: 125b asrs r3, r3, #9
2e: 4619 mov r1, r3
30: 687b ldr r3, [r7, #4]
32: 40cb lsrs r3, r1
34: 405a eors r2, r3
36: 693b ldr r3, [r7, #16]
38: 2b00 cmp r3, #0
3a: da01 bge.n 40 <test+0x40>
3c: f203 33ff addw r3, r3, #1023 ; 0x3ff
40: 129b asrs r3, r3, #10
42: 4619 mov r1, r3
44: 683b ldr r3, [r7, #0]
46: 40cb lsrs r3, r1
48: 4053 eors r3, r2
4a: 617b str r3, [r7, #20]
4c: 693b ldr r3, [r7, #16]
4e: 3301 adds r3, #1
50: 613b str r3, [r7, #16]
52: 693b ldr r3, [r7, #16]
54: f5b3 5f80 cmp.w r3, #4096 ; 0x1000
58: dbda blt.n 10 <test+0x10>
5a: 697b ldr r3, [r7, #20]
5c: 4618 mov r0, r3
5e: 371c adds r7, #28
60: 46bd mov sp, r7
62: bc80 pop {r7}
64: 4770 bx lr
00000066 <hello>:
66: b580 push {r7, lr}
68: af00 add r7, sp, #0
6a: 2114 movs r1, #20
6c: 2075 movs r0, #117 ; 0x75
6e: f7ff fffe bl 0 <test>
72: 4603 mov r3, r0
74: 4618 mov r0, r3
76: bd80 pop {r7, pc}
-O1
arm-none-eabi-gcc -O1 so.c -c -mthumb -mcpu=cortex-m4 -o so.o
arm-none-eabi-objdump -D so.o
so.o: file format elf32-littlearm
Disassembly of section .text:
00000000 <test>:
0: f44f 5380 mov.w r3, #4096 ; 0x1000
4: 3b01 subs r3, #1
6: d1fd bne.n 4 <test+0x4>
8: 08ca lsrs r2, r1, #3
a: ea82 12d0 eor.w r2, r2, r0, lsr #7
e: ea80 0301 eor.w r3, r0, r1
12: b2db uxtb r3, r3
14: ea82 0003 eor.w r0, r2, r3
18: 4770 bx lr
0000001a <hello>:
1a: b508 push {r3, lr}
1c: 2114 movs r1, #20
1e: 2075 movs r0, #117 ; 0x75
20: f7ff fffe bl 0 <test>
24: bd08 pop {r3, pc}
-O2
Disassembly of section .text:
00000000 <test>:
0: ea80 0301 eor.w r3, r0, r1
4: 08ca lsrs r2, r1, #3
6: ea82 10d0 eor.w r0, r2, r0, lsr #7
a: b2db uxtb r3, r3
c: 4058 eors r0, r3
e: 4770 bx lr
00000010 <hello>:
10: 2063 movs r0, #99 ; 0x63
12: 4770 bx lr
-O3
00000000 <test>:
0: ea80 0301 eor.w r3, r0, r1
4: 08ca lsrs r2, r1, #3
6: ea82 10d0 eor.w r0, r2, r0, lsr #7
a: b2db uxtb r3, r3
c: 4058 eors r0, r3
e: 4770 bx lr
00000010 <hello>:
10: 2063 movs r0, #99 ; 0x63
12: 4770 bx lr
-Os
00000000 <test>:
0: 08cb lsrs r3, r1, #3
2: ea83 13d0 eor.w r3, r3, r0, lsr #7
6: 4048 eors r0, r1
8: b2c0 uxtb r0, r0
a: 4058 eors r0, r3
c: 4770 bx lr
0000000e <hello>:
e: 2114 movs r1, #20
10: 2075 movs r0, #117 ; 0x75
12: f7ff bffe b.w 0 <test>
如果所有这些都在相同的时间内执行,那么很明显是的,你要么有构建问题,要么有测试问题。如果你声称 -O1 和 -O2 和 -O3 等都产生相同的输出,那么你实际上并没有使用那些优化级别。
没有理由假设 -Os 产生比 -O2 或 -O3 更小的二进制文件。只是您在暗示这种愿望。您可以创建例外。
也没有理由假设为大小编译会执行得更快,-O3 等也不会。尤其是在这样的平台上(以及所有现代平台),其中某些百分比的性能与数字或指令序列,而是整个系统。
您使用的是 stm32、cortex-m4,因此您拥有无法关闭的 st 闪存缓存,现在这将有助于所有测试,但也会隐藏一些东西。你有一个时钟初始化,然后是一个闪光灯设置,想知道如果你正在提高你的时钟那里发生了什么,那么你必须先减慢闪光灯速度,而不是之后,否则你可能会崩溃。对于这样的测试,通常没有理由增加时钟,您希望在理想情况下以定时器时钟周期测量系统(如 cpu)时钟周期,然后在较慢的时钟速度(有些部分是全范围的,但)你可以使用最小的闪存等待状态,然后简单地提高等待状态以进行不同的测试,而无需提高时钟以查看闪存如何影响它不幸的是这是一个 stm32。要解决这个问题,您可以 运行 在 sram 中进行测试。
根据内核的编译时间选项,一些内核具有不同的提取功能和其他功能,您可能有一些核心功能,您可能会搞砸像这样的紧密循环的简单对齐更改可能会产生巨大的影响, 相同的机器代码从不同的地址开始,它在获取行和缓存行中的排列方式会影响基准测试结果。
请注意,使用调试器计时器所需的 systick 计时器可以获得相同的结果。可以将时间收集包装在被测代码中(不是在函数中,但是当您提升汇编语言以制作被测代码时,您可以在之前和之后添加时间收集,而不会产生本身可能会有所不同的函数调用开销从测试到测试。
如果您看到编译器针对不同的设置生成了相同的机器代码,那么您实际上并不是在使用这些设置进行构建,实际上并不是在重新构建应用程序,或者是其他形式的用户错误(在这里构建和使用那里的二进制文件)。结果,在这种情况下,理想情况下,相同的二进制文件将给出相同的时间加上或减去时钟。但这还取决于您 运行 或重新 运行 测试的方式。想不想看缓存效果,先填充缓存再运行测试等
如果您开始看到不同的机器代码,或者如果您确实看到了不同的机器代码但得到的时间相同,那么错误就出在时间测量上,这是基准测试中经常被忽视的问题。只要您真的看到了那个计时器,您的方法似乎就没问题,并且已经完成测试以查看计时器是否正在计数并且朝着您期望的方向发展。如果这是一些未执行时间的指令计数器,那么您仍然可以测试它以查看它是否按照您的想法进行。我对那些调试工具没有用处,所以不要涉足它们,也不要像我对这些系统的其他了解那样了解它们。
作为 m4,您可能还可以使用其他功能 on/off 以查看基于生成的代码、分支预测、缓存、类似 mmu 的东西等方面的性能差异。
这可能是您使用的标志的顺序(每个标志都是第一个问题的原因)相对于 -O3,有些可能会否定其他优化功能。
很想知道真正的目标是什么。明白基准测试是无稽之谈,因为它们很容易操作,相同的高级代码由于各种原因预计不会使用相同或不同的工具在相同的目标中产生相同的结果。降低命令行并尝试 clang/llvm vs gnu 或尝试 gcc 4.x.x、5.x.x 等。在 4.x.x 输出开始变得臃肿之后,编译器并没有做得那么好,对于这样的事情,虽然它们应该非常接近但同时少了或多了一条指令,一个简单的对齐差异可能会使两个测试的结果大相径庭。
然后当你放回改变工作方式的时钟设置时,你可以说不使用等待状态(闪光灯可能 运行s 具有 CPU 速率,所以有一个内置等待)高达 25mhz 作为示例,然后添加一个等待状态高达 50 等等。因设计而异,一些较新的部件闪存可以 运行 比旧部件快得多,但在 25mhz 与 8 频率下,相同数量的时钟是总体上更小的时间,挂钟时间。在边界处,如果您 create/modify 时钟初始化代码并获得性能提升,则可以说您不会增加等待状态,但在该边界上,您会因闪存等待状态增加而受到性能影响。所以这里有一个性能平衡。
总结
如果相同的代码从编译器中出来,那么它就是您的命令行,您可以轻松地简化命令行以查看这些工具将生成不同的代码。如果您的比较是错误的并且代码不同,那么问题是您如何对代码计时,这通常是基准测试出错的地方,以及与编译器命令行无关的其他因素。基准通常是无意义的,因为它们可以被操纵以显示不同的结果(即使不更改测试的高级源代码)。
尝试简化命令行,检查那里的每个选项,并证明为什么它适用于您的特定应用程序。尽可能地验证定时器或指令计数器(并了解执行的指令计数与性能没有直接关系,您可以拥有比其他解决方案执行速度快 100 倍的指令)。
没有理由期望 -Os 会产生更小的代码,人们希望如此,但也有例外。同样,-Os 可能比 -O2 或 -O3 执行得更快,没有理由期望更大的优化级别会产生“更快”的代码。
您正在使用 -O0 标志编译代码。
这里看得很清楚: https://godbolt.org/z/qZPYqJ
所以编译器永远是对的。未发现遗漏的优化。
好吧,真正的答案并不容易,但在反汇编某些东西之前,应该了解优化实际上是什么以及编译器如何实现其目标。 考虑到 gcc,Os 和 03 之间几乎没有区别,因为它们打开几乎相同的内部标志,除了 Os.
的循环展开此外,如今 cpu 将所有内容都放在缓存中会更快。