使用 arm-none-eabi-gcc 编译时不考虑优化标志

Optimization flag is not taken into account when compiling with arm-none-eabi-gcc

我想使用 libopencm3 项目和 运行 在 ARM Cortex-M4 处理器上使用 arm-none-eabi-gcc 9.2.1 编译程序。我的程序由两个文件组成:main.c

#include "../common/stm32wrapper.h"
#include "test.h"
#include <stdio.h>
#include <string.h>

typedef unsigned char u8;
typedef unsigned int  u32;
typedef unsigned long long u64;

int main(void)
{
    clock_setup();
    gpio_setup();
    usart_setup(115200);
    flash_setup();

    SCS_DEMCR |= SCS_DEMCR_TRCENA;
    DWT_CYCCNT = 0;
    DWT_CTRL |= DWT_CTRL_CYCCNTENA;

    u32 oldcount, newcount;
    u32 a = 0x75;
    u32 b = 0x14;
    char buffer[36];
    oldcount = DWT_CYCCNT;
    u32 c = test(a,b);
    newcount = DWT_CYCCNT-oldcount;
    sprintf(buffer, "cycles: %d, %08x", newcount, c);
    send_USART_str(buffer);
    return 0;
}

test.c.

uint32_t test(uint32_t a, uint32_t b) {
    uint32_t tmp0, tmp1;
    uint32_t c;

    for(int i = 0; i< 4096; i++) {
        tmp0 = a & 0xff;
        tmp1 = b & 0xff;
        c = tmp0 ^ tmp1 ^ (a>>(i/512)) ^ (b >> (i/1024));
    }
    return c;
}

为了编译我的程序,我使用以下 makefile:

.PHONY: all clean

PREFIX  ?= arm-none-eabi
CC      = $(PREFIX)-gcc -v
LD      = $(PREFIX)-gcc -v
OBJCOPY = $(PREFIX)-objcopy
OBJDUMP = $(PREFIX)-objdump
GDB     = $(PREFIX)-gdb

OPENCM3DIR = ../libopencm3
ARMNONEEABIDIR = /usr/arm-none-eabi
COMMONDIR = ../common

all: test_m4.bin

test_m4.%: ARCH_FLAGS = -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16
test_m4.o: CFLAGS += -DSTM32F4
$(COMMONDIR)/stm32f4_wrapper.o: CFLAGS += -DSTM32F4
test_m4.elf: LDSCRIPT = $(COMMONDIR)/stm32f4-discovery.ld
test_m4.elf: LDFLAGS += -L$(OPENCM3DIR)/lib/ -lopencm3_stm32f4
test_m4.elf: OBJS += $(COMMONDIR)/stm32f4_wrapper.o 
test_m4.elf: $(COMMONDIR)/stm32f4_wrapper.o $(OPENCM3DIR)/lib/libopencm3_stm32f4.a

CFLAGS      += -O3 \
           -Wall -Wextra -Wimplicit-function-declaration \
           -Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes \
           -Wundef -Wshadow \
           -I$(ARMNONEEABIDIR)/include -I$(OPENCM3DIR)/include \
           -fno-common $(ARCH_FLAGS) -MD \
           -ftime-report
LDFLAGS     += --static -Wl,--start-group -lc -lgcc -lnosys -Wl,--end-group \
           -T$(LDSCRIPT) -nostartfiles -Wl,--gc-sections,--no-print-gc-sections \
           $(ARCH_FLAGS)

OBJS        += test.c

%.bin: %.elf
    $(OBJCOPY) -Obinary $^ $@

%.elf: %.o $(OBJS) $(LDSCRIPT)
    $(LD) -o $@ $< $(OBJS) $(LDFLAGS)

test%.o: main.c
    $(CC) $(CFLAGS) -o $@ -c $^

%.o: %.c 
    $(CC) $(CFLAGS) -o $@ -c $^

clean:
    rm -f *.o *.d *.elf *.bin

我可以使用此生成文件编译和 运行 我的代码。通过 运行ning make 我得到以下输出:

arm-none-eabi-gcc -v -O3 -Wall -Wextra -Wimplicit-function-declaration -Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes -Wundef -Wshadow -I/usr/arm-none-eabi/include -I../libopencm3/include -fno-common -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -MD -ftime-report -DSTM32F4 -o test_m4.o -c main.c
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
Target: arm-none-eabi
Configured with: /mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/src/gcc/configure --target=arm-none-eabi --prefix=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native --libexecdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/lib --infodir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/man --htmldir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/html --pdfdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/arm-none-eabi --build=x86_64-linux-gnu --host=x86_64-linux-gnu --with-gmp=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpfr=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpc=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-isl=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-libelf=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-pkgversion='GNU Tools for Arm Embedded Processors 9-2019-q4-major' --with-multilib-list=rmprofile
Thread model: single
gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 
COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-Wimplicit-function-declaration' '-Wredundant-decls' '-Wmissing-prototypes' '-Wstrict-prototypes' '-Wundef' '-Wshadow' '-I' '/usr/arm-none-eabi/include' '-I' '../libopencm3/include' '-fno-common' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-MD' '-ftime-report' '-D' 'STM32F4' '-o' 'test_m4.o' '-c' '-march=armv7e-m+fp'
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/cc1 -quiet -v -I /usr/arm-none-eabi/include -I ../libopencm3/include -imultilib thumb/v7e-m+fp/hard -iprefix /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/ -isysroot /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi -MD test_m4.d -MQ test_m4.o -D__USES_INITFINI__ -D STM32F4 main.c -quiet -dumpbase main.c -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -march=armv7e-m+fp -auxbase-strip test_m4.o -O3 -Wall -Wextra -Wimplicit-function-declaration -Wredundant-decls -Wmissing-prototypes -Wstrict-prototypes -Wundef -Wshadow -version -fno-common -ftime-report -o /tmp/ccm5h1i9.s
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
    compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/local/include"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include-fixed"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/include"
ignoring nonexistent directory "/usr/arm-none-eabi/include"
#include "..." search starts here:
#include <...> search starts here:
 ../libopencm3/include
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include-fixed
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include
End of search list.
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
    compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 4381e146d4f016ae8e44a645dba65184

Time variable                                   usr           sys          wall               GGC
 phase setup                        :   0.01 (  8%)   0.01 ( 20%)   0.03 ( 17%)    3569 kB ( 62%)
 phase parsing                      :   0.10 ( 83%)   0.04 ( 80%)   0.14 ( 78%)    2069 kB ( 36%)
 phase opt and generate             :   0.01 (  8%)   0.00 (  0%)   0.01 (  6%)     120 kB (  2%)
 preprocessing                      :   0.03 ( 25%)   0.03 ( 60%)   0.03 ( 17%)     889 kB ( 15%)
 lexical analysis                   :   0.04 ( 33%)   0.00 (  0%)   0.05 ( 28%)       0 kB (  0%)
 parser (global)                    :   0.02 ( 17%)   0.00 (  0%)   0.04 ( 22%)    1063 kB ( 18%)
 parser struct body                 :   0.00 (  0%)   0.00 (  0%)   0.01 (  6%)      41 kB (  1%)
 parser enumerator list             :   0.01 (  8%)   0.01 ( 20%)   0.01 (  6%)      54 kB (  1%)
 tree gimplify                      :   0.00 (  0%)   0.00 (  0%)   0.01 (  6%)       8 kB (  0%)
 initialize rtl                     :   0.01 (  8%)   0.00 (  0%)   0.00 (  0%)       7 kB (  0%)
 TOTAL                              :   0.12          0.05          0.18           5767 kB
COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-Wimplicit-function-declaration' '-Wredundant-decls' '-Wmissing-prototypes' '-Wstrict-prototypes' '-Wundef' '-Wshadow' '-I' '/usr/arm-none-eabi/include' '-I' '../libopencm3/include' '-fno-common' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-MD' '-ftime-report' '-D' 'STM32F4' '-o' 'test_m4.o' '-c' '-march=armv7e-m+fp'
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/as -v -I /usr/arm-none-eabi/include -I ../libopencm3/include -march=armv7e-m -mfloat-abi=hard -mfpu=fpv4-sp-d16 -meabi=5 -o test_m4.o /tmp/ccm5h1i9.s
GNU assembler version 2.33.1 (arm-none-eabi) using BFD version (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 2.33.1.20191025
COMPILER_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/
LIBRARY_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/
COLLECT_GCC_OPTIONS='-v' '-O3' '-Wall' '-Wextra' '-Wimplicit-function-declaration' '-Wredundant-decls' '-Wmissing-prototypes' '-Wstrict-prototypes' '-Wundef' '-Wshadow' '-I' '/usr/arm-none-eabi/include' '-I' '../libopencm3/include' '-fno-common' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-MD' '-ftime-report' '-D' 'STM32F4' '-o' 'test_m4.o' '-c' '-march=armv7e-m+fp'
arm-none-eabi-gcc -v -o test_m4.elf test_m4.o test.c ../common/stm32f4_wrapper.o  --static -Wl,--start-group -lc -lgcc -lnosys -Wl,--end-group -T../common/stm32f4-discovery.ld -nostartfiles -Wl,--gc-sections,--no-print-gc-sections -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -L../libopencm3/lib/ -lopencm3_stm32f4
Using built-in specs.
COLLECT_GCC=arm-none-eabi-gcc
COLLECT_LTO_WRAPPER=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/lto-wrapper
Target: arm-none-eabi
Configured with: /mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/src/gcc/configure --target=arm-none-eabi --prefix=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native --libexecdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/lib --infodir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/info --mandir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/man --htmldir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/html --pdfdir=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/share/doc/gcc-arm-none-eabi/pdf --enable-languages=c,c++ --enable-plugins --disable-decimal-float --disable-libffi --disable-libgomp --disable-libmudflap --disable-libquadmath --disable-libssp --disable-libstdcxx-pch --disable-nls --disable-shared --disable-threads --disable-tls --with-gnu-as --with-gnu-ld --with-newlib --with-headers=yes --with-python-dir=share/gcc-arm-none-eabi --with-sysroot=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/install-native/arm-none-eabi --build=x86_64-linux-gnu --host=x86_64-linux-gnu --with-gmp=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpfr=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-mpc=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-isl=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-libelf=/mnt/workspace/workspace/GCC-9-pipeline/jenkins-GCC-9-pipeline-100_20191030_1572397542/build-native/host-libs/usr --with-host-libstdcxx='-static-libgcc -Wl,-Bstatic,-lstdc++,-Bdynamic -lm' --with-pkgversion='GNU Tools for Arm Embedded Processors 9-2019-q4-major' --with-multilib-list=rmprofile
Thread model: single
gcc version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/cc1 -quiet -v -imultilib thumb/v7e-m+fp/hard -iprefix /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/ -isysroot /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi -D__USES_INITFINI__ test.c -quiet -dumpbase test.c -mthumb -mcpu=cortex-m4 -mfloat-abi=hard -mfpu=fpv4-sp-d16 -march=armv7e-m+fp -auxbase test -version -o /tmp/cc3yny6o.s
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
    compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/local/include"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/include-fixed"
ignoring duplicate directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/../../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include"
ignoring nonexistent directory "/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/usr/include"
#include "..." search starts here:
#include <...> search starts here:
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/include-fixed
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/include
End of search list.
GNU C17 (GNU Tools for Arm Embedded Processors 9-2019-q4-major) version 9.2.1 20191025 (release) [ARM/arm-9-branch revision 277599] (arm-none-eabi)
    compiled by GNU C version 4.8.4, GMP version 6.1.0, MPFR version 3.1.4, MPC version 1.0.3, isl version isl-0.18-GMP

GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072
Compiler executable checksum: 4381e146d4f016ae8e44a645dba65184
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/as -v -march=armv7e-m -mfloat-abi=hard -mfpu=fpv4-sp-d16 -meabi=5 -o /tmp/ccfflDpW.o /tmp/cc3yny6o.s
GNU assembler version 2.33.1 (arm-none-eabi) using BFD version (GNU Tools for Arm Embedded Processors 9-2019-q4-major) 2.33.1.20191025
COMPILER_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/bin/
LIBRARY_PATH=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/thumb/v7e-m+fp/hard/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/:/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
 /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/collect2 -plugin /usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/liblto_plugin.so -plugin-opt=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/lto-wrapper -plugin-opt=-fresolution=/tmp/cc4qN1Kt.res -plugin-opt=-pass-through=-lgcc -plugin-opt=-pass-through=-lc --sysroot=/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi -Bstatic -X -o test_m4.elf -L../libopencm3/lib/ -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/thumb/v7e-m+fp/hard -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib/thumb/v7e-m+fp/hard -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib/thumb/v7e-m+fp/hard -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1 -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../lib/gcc/arm-none-eabi/9.2.1/../../../../arm-none-eabi/lib -L/usr/gcc-arm-none-eabi-9-2019-q4-major/bin/../arm-none-eabi/lib test_m4.o /tmp/ccfflDpW.o ../common/stm32f4_wrapper.o --start-group -lc -lgcc -lnosys --end-group --gc-sections --no-print-gc-sections -lopencm3_stm32f4 --start-group -lgcc -lc --end-group -T ../common/stm32f4-discovery.ld
COLLECT_GCC_OPTIONS='-v' '-o' 'test_m4.elf' '-static' '-T' '../common/stm32f4-discovery.ld' '-nostartfiles' '-mthumb' '-mcpu=cortex-m4' '-mfloat-abi=hard' '-mfpu=fpv4-sp-d16' '-L../libopencm3/lib/' '-march=armv7e-m+fp'
arm-none-eabi-objcopy -Obinary test_m4.elf test_m4.bin

似乎没有考虑优化标志,因为无论我输入什么,生成的二进制文件总是相同的,程序总是打印 cycles: 196645, 00000063。通过反汇编二进制文件,我得到了 -Os 和 -O3 优化的以下输出:

080001ac <main>:
 80001ac:   b570        push    {r4, r5, r6, lr}
 80001ae:   b08a        sub sp, #40 ; 0x28
 80001b0:   f006 fc06   bl  80069c0 <clock_setup>
 80001b4:   f006 fc1c   bl  80069f0 <gpio_setup>
 80001b8:   f44f 30e1   mov.w   r0, #115200 ; 0x1c200
 80001bc:   f006 fc32   bl  8006a24 <usart_setup>
 80001c0:   f006 fc52   bl  8006a68 <flash_setup>
 80001c4:   490e        ldr r1, [pc, #56]   ; (8000200 <main+0x54>)
 80001c6:   4c0f        ldr r4, [pc, #60]   ; (8000204 <main+0x58>)
 80001c8:   680b        ldr r3, [r1, #0]
 80001ca:   4a0f        ldr r2, [pc, #60]   ; (8000208 <main+0x5c>)
 80001cc:   2500        movs    r5, #0
 80001ce:   f043 7380   orr.w   r3, r3, #16777216   ; 0x1000000
 80001d2:   600b        str r3, [r1, #0]
 80001d4:   6025        str r5, [r4, #0]
 80001d6:   6813        ldr r3, [r2, #0]
 80001d8:   f043 0301   orr.w   r3, r3, #1
 80001dc:   6013        str r3, [r2, #0]
 80001de:   6826        ldr r6, [r4, #0]
 80001e0:   f000 f816   bl  8000210 <test>
 80001e4:   6822        ldr r2, [r4, #0]
 80001e6:   4909        ldr r1, [pc, #36]   ; (800020c <main+0x60>)
 80001e8:   4603        mov r3, r0
 80001ea:   1b92        subs    r2, r2, r6
 80001ec:   a801        add r0, sp, #4
 80001ee:   f006 fca5   bl  8006b3c <sprintf>
 80001f2:   a801        add r0, sp, #4
 80001f4:   f006 fc48   bl  8006a88 <send_USART_str>
 80001f8:   4628        mov r0, r5
 80001fa:   b00a        add sp, #40 ; 0x28
 80001fc:   bd70        pop {r4, r5, r6, pc}
 80001fe:   bf00        nop
 8000200:   e000edfc    .word   0xe000edfc
 8000204:   e0001004    .word   0xe0001004
 8000208:   e0001000    .word   0xe0001000
 800020c:   0800c1e8    .word   0x0800c1e8

08000210 <test>:
 8000210:   b480        push    {r7}
 8000212:   b087        sub sp, #28
 8000214:   af00        add r7, sp, #0
 8000216:   2375        movs    r3, #117    ; 0x75
 8000218:   60fb        str r3, [r7, #12]
 800021a:   2314        movs    r3, #20
 800021c:   60bb        str r3, [r7, #8]
 800021e:   2300        movs    r3, #0
 8000220:   613b        str r3, [r7, #16]
 8000222:   e020        b.n 8000266 <test+0x56>
 8000224:   68fb        ldr r3, [r7, #12]
 8000226:   b2db        uxtb    r3, r3
 8000228:   607b        str r3, [r7, #4]
 800022a:   68bb        ldr r3, [r7, #8]
 800022c:   b2db        uxtb    r3, r3
 800022e:   603b        str r3, [r7, #0]
 8000230:   687a        ldr r2, [r7, #4]
 8000232:   683b        ldr r3, [r7, #0]
 8000234:   405a        eors    r2, r3
 8000236:   693b        ldr r3, [r7, #16]
 8000238:   2b00        cmp r3, #0
 800023a:   da01        bge.n   8000240 <test+0x30>
 800023c:   f203 13ff   addw    r3, r3, #511    ; 0x1ff
 8000240:   125b        asrs    r3, r3, #9
 8000242:   4619        mov r1, r3
 8000244:   68fb        ldr r3, [r7, #12]
 8000246:   40cb        lsrs    r3, r1
 8000248:   405a        eors    r2, r3
 800024a:   693b        ldr r3, [r7, #16]
 800024c:   2b00        cmp r3, #0
 800024e:   da01        bge.n   8000254 <test+0x44>
 8000250:   f203 33ff   addw    r3, r3, #1023   ; 0x3ff
 8000254:   129b        asrs    r3, r3, #10
 8000256:   4619        mov r1, r3
 8000258:   68bb        ldr r3, [r7, #8]
 800025a:   40cb        lsrs    r3, r1
 800025c:   4053        eors    r3, r2
 800025e:   617b        str r3, [r7, #20]
 8000260:   693b        ldr r3, [r7, #16]
 8000262:   3301        adds    r3, #1
 8000264:   613b        str r3, [r7, #16]
 8000266:   693b        ldr r3, [r7, #16]
 8000268:   f5b3 5f80   cmp.w   r3, #4096   ; 0x1000
 800026c:   dbda        blt.n   8000224 <test+0x14>
 800026e:   697b        ldr r3, [r7, #20]
 8000270:   4618        mov r0, r3
 8000272:   371c        adds    r7, #28
 8000274:   46bd        mov sp, r7
 8000276:   f85d 7b04   ldr.w   r7, [sp], #4
 800027a:   4770        bx  lr

我觉得很奇怪,因为代码在速度方面可以明显提高。例如,可以计算单个 uxtb 而不是两个(如果在 eor 之后执行),所以我认为这里有问题。为什么这里没有考虑优化标志?我的 makefile 有问题吗?

typedef unsigned int uint32_t;

uint32_t test(uint32_t a, uint32_t b) {
    uint32_t tmp0, tmp1;
    uint32_t c;

    for(int i = 0; i< 4096; i++) {
        tmp0 = a & 0xff;
        tmp1 = b & 0xff;
        c = tmp0 ^ tmp1 ^ (a>>(i/512)) ^ (b >> (i/1024));
    }
    return c;
}

unsigned int hello ( void )
{
    return(test(0x75,0x14));
}

9.3.0 和 9.2.1 不会有太大区别,如果你想看,我可以专门得到一个 9.2.1,但你可以自己看。

arm-none-eabi-gcc --version
arm-none-eabi-gcc (GCC) 9.3.0
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

-O0

arm-none-eabi-gcc -O0 so.c -c -mthumb -mcpu=cortex-m4 -o so.o

Disassembly of section .text:

00000000 <test>:
   0:   b480        push    {r7}
   2:   b087        sub sp, #28
   4:   af00        add r7, sp, #0
   6:   6078        str r0, [r7, #4]
   8:   6039        str r1, [r7, #0]
   a:   2300        movs    r3, #0
   c:   613b        str r3, [r7, #16]
   e:   e020        b.n 52 <test+0x52>
  10:   687b        ldr r3, [r7, #4]
  12:   b2db        uxtb    r3, r3
  14:   60fb        str r3, [r7, #12]
  16:   683b        ldr r3, [r7, #0]
  18:   b2db        uxtb    r3, r3
  1a:   60bb        str r3, [r7, #8]
  1c:   68fa        ldr r2, [r7, #12]
  1e:   68bb        ldr r3, [r7, #8]
  20:   405a        eors    r2, r3
  22:   693b        ldr r3, [r7, #16]
  24:   2b00        cmp r3, #0
  26:   da01        bge.n   2c <test+0x2c>
  28:   f203 13ff   addw    r3, r3, #511    ; 0x1ff
  2c:   125b        asrs    r3, r3, #9
  2e:   4619        mov r1, r3
  30:   687b        ldr r3, [r7, #4]
  32:   40cb        lsrs    r3, r1
  34:   405a        eors    r2, r3
  36:   693b        ldr r3, [r7, #16]
  38:   2b00        cmp r3, #0
  3a:   da01        bge.n   40 <test+0x40>
  3c:   f203 33ff   addw    r3, r3, #1023   ; 0x3ff
  40:   129b        asrs    r3, r3, #10
  42:   4619        mov r1, r3
  44:   683b        ldr r3, [r7, #0]
  46:   40cb        lsrs    r3, r1
  48:   4053        eors    r3, r2
  4a:   617b        str r3, [r7, #20]
  4c:   693b        ldr r3, [r7, #16]
  4e:   3301        adds    r3, #1
  50:   613b        str r3, [r7, #16]
  52:   693b        ldr r3, [r7, #16]
  54:   f5b3 5f80   cmp.w   r3, #4096   ; 0x1000
  58:   dbda        blt.n   10 <test+0x10>
  5a:   697b        ldr r3, [r7, #20]
  5c:   4618        mov r0, r3
  5e:   371c        adds    r7, #28
  60:   46bd        mov sp, r7
  62:   bc80        pop {r7}
  64:   4770        bx  lr

00000066 <hello>:
  66:   b580        push    {r7, lr}
  68:   af00        add r7, sp, #0
  6a:   2114        movs    r1, #20
  6c:   2075        movs    r0, #117    ; 0x75
  6e:   f7ff fffe   bl  0 <test>
  72:   4603        mov r3, r0
  74:   4618        mov r0, r3
  76:   bd80        pop {r7, pc}

-O1

arm-none-eabi-gcc -O1 so.c -c -mthumb -mcpu=cortex-m4 -o so.o
arm-none-eabi-objdump -D so.o

so.o:     file format elf32-littlearm


Disassembly of section .text:

00000000 <test>:
   0:   f44f 5380   mov.w   r3, #4096   ; 0x1000
   4:   3b01        subs    r3, #1
   6:   d1fd        bne.n   4 <test+0x4>
   8:   08ca        lsrs    r2, r1, #3
   a:   ea82 12d0   eor.w   r2, r2, r0, lsr #7
   e:   ea80 0301   eor.w   r3, r0, r1
  12:   b2db        uxtb    r3, r3
  14:   ea82 0003   eor.w   r0, r2, r3
  18:   4770        bx  lr

0000001a <hello>:
  1a:   b508        push    {r3, lr}
  1c:   2114        movs    r1, #20
  1e:   2075        movs    r0, #117    ; 0x75
  20:   f7ff fffe   bl  0 <test>
  24:   bd08        pop {r3, pc}

-O2

Disassembly of section .text:

00000000 <test>:
   0:   ea80 0301   eor.w   r3, r0, r1
   4:   08ca        lsrs    r2, r1, #3
   6:   ea82 10d0   eor.w   r0, r2, r0, lsr #7
   a:   b2db        uxtb    r3, r3
   c:   4058        eors    r0, r3
   e:   4770        bx  lr

00000010 <hello>:
  10:   2063        movs    r0, #99 ; 0x63
  12:   4770        bx  lr

-O3

00000000 <test>:
   0:   ea80 0301   eor.w   r3, r0, r1
   4:   08ca        lsrs    r2, r1, #3
   6:   ea82 10d0   eor.w   r0, r2, r0, lsr #7
   a:   b2db        uxtb    r3, r3
   c:   4058        eors    r0, r3
   e:   4770        bx  lr

00000010 <hello>:
  10:   2063        movs    r0, #99 ; 0x63
  12:   4770        bx  lr

-Os

00000000 <test>:
   0:   08cb        lsrs    r3, r1, #3
   2:   ea83 13d0   eor.w   r3, r3, r0, lsr #7
   6:   4048        eors    r0, r1
   8:   b2c0        uxtb    r0, r0
   a:   4058        eors    r0, r3
   c:   4770        bx  lr

0000000e <hello>:
   e:   2114        movs    r1, #20
  10:   2075        movs    r0, #117    ; 0x75
  12:   f7ff bffe   b.w 0 <test>

如果所有这些都在相同的时间内执行,那么很明显是的,你要么有构建问题,要么有测试问题。如果你声称 -O1 和 -O2 和 -O3 等都产生相同的输出,那么你实际上并没有使用那些优化级别。

没有理由假设 -Os 产生比 -O2 或 -O3 更小的二进制文件。只是您在暗示这种愿望。您可以创建例外。

也没有理由假设为大小编译会执行得更快,-O3 等也不会。尤其是在这样的平台上(以及所有现代平台),其中某些百分比的性能与数字或指令序列,而是整个系统。

您使用的是 stm32、cortex-m4,因此您拥有无法关闭的 st 闪存缓存,现在这将有助于所有测试,但也会隐藏一些东西。你有一个时钟初始化,然后是一个闪光灯设置,想知道如果你正在提高你的时钟那里发生了什么,那么你必须先减慢闪光灯速度,而不是之后,否则你可能会崩溃。对于这样的测试,通常没有理由增加时钟,您希望在理想情况下以定时器时钟周期测量系统(如 cpu)时钟周期,然后在较慢的时钟速度(有些部分是全范围的,但)你可以使用最小的闪存等待状态,然后简单地提高等待状态以进行不同的测试,而无需提高时钟以查看闪存如何影响它不幸的是这是一个 stm32。要解决这个问题,您可以 运行 在 sram 中进行测试。

根据内核的编译时间选项,一些内核具有不同的提取功能和其他功能,您可能有一些核心功能,您可能会搞砸像这样的紧密循环的简单对齐更改可能会产生巨大的影响, 相同的机器代码从不同的地址开始,它在获取行和缓存行中的排列方式会影响基准测试结果。

请注意,使用调试器计时器所需的 systick 计时器可以获得相同的结果。可以将时间收集包装在被测代码中(不是在函数中,但是当您提升汇编语言以制作被测代码时,您可以在之前和之后添加时间收集,而不会产生本身可能会有所不同的函数调用开销从测试到测试。

如果您看到编译器针对不同的设置生成了相同的机器代码,那么您实际上并不是在使用这些设置进行构建,实际上并不是在重新构建应用程序,或者是其他形式的用户错误(在这里构建和使用那里的二进制文件)。结果,在这种情况下,理想情况下,相同的二进制文件将给出相同的时间加上或减去时钟。但这还取决于您 运行 或重新 运行 测试的方式。想不想看缓存效果,先填充缓存再运行测试等

如果您开始看到不同的机器代码,或者如果您确实看到了不同的机器代码但得到的时间相同,那么错误就出在时间测量上,这是基准测试中经常被忽视的问题。只要您真的看到了那个计时器,您的方法似乎就没问题,并且已经完成测试以查看计时器是否正在计数并且朝着您期望的方向发展。如果这是一些未执行时间的指令计数器,那么您仍然可以测试它以查看它是否按照您的想法进行。我对那些调试工具没有用处,所以不要涉足它们,也不要像我对这些系统的其他了解那样了解它们。

作为 m4,您可能还可以使用其他功能 on/off 以查看基于生成的代码、分支预测、缓存、类似 mmu 的东西等方面的性能差异。

这可能是您使用的标志的顺序(每个标志都是第一个问题的原因)相对于 -O3,有些可能会否定其他优化功能。

很想知道真正的目标是什么。明白基准测试是无稽之谈,因为它们很容易操作,相同的高级代码由于各种原因预计不会使用相同或不同的工具在相同的目标中产生相同的结果。降低命令行并尝试 clang/llvm vs gnu 或尝试 gcc 4.x.x、5.x.x 等。在 4.x.x 输出开始变得臃肿之后,编译器并没有做得那么好,对于这样的事情,虽然它们应该非常接近但同时少了或多了一条指令,一个简单的对齐差异可能会使两个测试的结果大相径庭。

然后当你放回改变工作方式的时钟设置时,你可以说不使用等待状态(闪光灯可能 运行s 具有 CPU 速率,所以有一个内置等待)高达 25mhz 作为示例,然后添加一个等待状态高达 50 等等。因设计而异,一些较新的部件闪存可以 运行 比旧部件快得多,但在 25mhz 与 8 频率下,相同数量的时钟是总体上更小的时间,挂钟时间。在边界处,如果您 create/modify 时钟初始化代码并获得性能提升,则可以说您不会增加等待状态,但在该边界上,您会因闪存等待状态增加而受到性能影响。所以这里有一个性能平衡。

总结

如果相同的代码从编译器中出来,那么它就是您的命令行,您可以轻松地简化命令行以查看这些工具将生成不同的代码。如果您的比较是错误的并且代码不同,那么问题是您如何对代码计时,这通常是基准测试出错的地方,以及与编译器命令行无关的其他因素。基准通常是无意义的,因为它们可以被操纵以显示不同的结果(即使不更改测试的高级源代码)。

尝试简化命令行,检查那里的每个选项,并证明为什么它适用于您的特定应用程序。尽可能地验证定时器或指令计数器(并了解执行的指令计数与性能没有直接关系,您可以拥有比其他解决方案执行速度快 100 倍的指令)。

没有理由期望 -Os 会产生更小的代码,人们希望如此,但也有例外。同样,-Os 可能比 -O2 或 -O3 执行得更快,没有理由期望更大的优化级别会产生“更快”的代码。

您正在使用 -O0 标志编译代码。

这里看得很清楚: https://godbolt.org/z/qZPYqJ

所以编译器永远是对的。未发现遗漏的优化。

好吧,真正的答案并不容易,但在反汇编某些东西之前,应该了解优化实际上是什么以及编译器如何实现其目标。 考虑到 gcc,Os 和 03 之间几乎没有区别,因为它们打开几乎相同的内部标志,除了 Os.

的循环展开

此外,如今 cpu 将所有内容都放在缓存中会更快。