使用 cryptodev 引擎和硬件加速器的 openssl 速度测试会导致虚假的计时结果

openssl speed test using cryptodev engine along with hardware accelerator leads to spurious timing results

我正在尝试为我的硬件进行加密性能测试,同时我正在使用 openssl 速度测试命令。

我执行的第一个测试没有启用硬件加速器:

$ openssl speed -evp aes-128-cbc -engine cryptodev
Doing aes-128-cbc for 3s on 16 size blocks: 4437806 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 64 size blocks: 1244528 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 256 size blocks: 322780 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 1024 size blocks: 81429 aes-128-cbc's in 3.00s
Doing aes-128-cbc for 3s on 8192 size blocks: 10215 aes-128-cbc's in 3.00s

OpenSSL 1.0.1j 15 Oct 2014
built on: Thu Jul 23 18:58:46 CDT 2015
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
compiler: arm-poky-linux-gnueabi-gcc  -march=armv7-a -mthumb-interwork -mfloat-abi=hard
-mfpu=neon -mtune=cortex-a9 --sysroot=... -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS
-D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN   -DTERMIO  -O2 -pipe -g 
-feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS
-DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
-DAES_ASM -DGHASH_ASM

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes
aes-128-cbc      23668.30k    26549.93k    27543.89k    27794.43k    27893.76k

然后我启用硬件加速器并得到以下结果:

$ openssl speed -evp aes-128-cbc -engine cryptodev
engine "cryptodev" set.
Doing aes-128-cbc for 3s on 16 size blocks: 39552 aes-128-cbc's in ***0.08s***
Doing aes-128-cbc for 3s on 64 size blocks: 37060 aes-128-cbc's in ***0.05s***
Doing aes-128-cbc for 3s on 256 size blocks: 32674 aes-128-cbc's in ***0.07s***
Doing aes-128-cbc for 3s on 1024 size blocks: 26101 aes-128-cbc's in ***0.06s***
Doing aes-128-cbc for 3s on 8192 size blocks: 8286 aes-128-cbc's in ***0.02s***

OpenSSL 1.0.1j 15 Oct 2014
built on: Thu Jul 23 18:58:46 CDT 2015
options:bn(64,32) rc4(ptr,char) des(idx,cisc,16,long) aes(partial) idea(int) blowfish(ptr)
compiler: arm-poky-linux-gnueabi-gcc  -march=armv7-a -mthumb-interwork -mfloat-abi=hard
-mfpu=neon -mtune=cortex-a9 --sysroot=.... -fPIC -DOPENSSL_PIC -DOPENSSL_THREADS
-D_REENTRANT -DDSO_DLFCN -DHAVE_DLFCN_H -DL_ENDIAN   -DTERMIO  -O2 -pipe -g 
-feliminate-unused-debug-types -Wall -Wa,--noexecstack -DHAVE_CRYPTODEV -DUSE_CRYPTODEV_DIGESTS
-DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_GF2m -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM
-DAES_ASM -DGHASH_ASM

The 'numbers' are in 1000s of bytes per second processed.
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes

我的问题是,为什么测试 运行 会持续 10 毫秒,而测试应该针对每个块大小 运行 持续 3 秒。这是 OpenSSL 中的错误吗?


我查看了 OpenSSL 1.0.1j 版中的代码,发现了以下内容:

#ifndef OPENSSL_NO_AES
        if (doit[D_CBC_128_AES])
                {
                for (j=0; j<SIZE_NUM; j++)
                        {
                        print_message(names[D_CBC_128_AES],c[D_CBC_128_AES][j],lengths[j]);
                        Time_F(START);
                        for (count=0,run=1; COND(c[D_CBC_128_AES][j]); count++)
                                AES_cbc_encrypt(buf,buf,
                                        (unsigned long)lengths[j],&aes_ks1,
                                        iv,AES_ENCRYPT);
                        d=Time_F(STOP);
                        print_result(D_CBC_128_AES,j,count,d);
                        }
                }

这里Time_F定义如下:

static double Time_F(int s)
        {
        return app_tminterval(s,usertime);
        }
#endif

app_tminterval 定义为:

#else
#include <sys/time.h>
#include <sys/resource.h>

double app_tminterval(int stop,int usertime)
        {
        double          ret = 0;
        struct rusage   rus;
        struct timeval  now;
        static struct timeval tmstart;

        if (usertime)           getrusage(RUSAGE_SELF,&rus), now = rus.ru_utime;
        else                    gettimeofday(&now,NULL);

        if (stop==TM_START)     tmstart = now;
        else                    ret = ( (now.tv_sec+now.tv_usec*1e-6)
                                        - (tmstart.tv_sec+tmstart.tv_usec*1e-6) );

        return ret;
        }
#endif

在这里我很困惑应用程序间隔停止在小于10ms而实际测试没有硬件加速器运行s 3s.

谢谢

使用-elapsed 来衡量总时间。

Openssl 测量 cpu 时间,当您将算法卸载到硬件加速器时,它释放了 cpu 的使用。结果,openssl 不会计算大部分时间,因为它花费在 cpu.

之外