使用 intel mkl 在同一进程中提供多个 Tensorflow 模型

Serving multiple Tensorflow model in same process with intel mkl

我在一个进程中服务多个模型,每个模型创建一个 Tensorflow 会话。假设有 8 个模型,因此创建了 8 个 tf.session。

然后我跟着Optimizing for CPU, Tuning_mkl_for_the_best_performance打开MKL。 我的机器有 8 个核心和 2 个线程。我将每个 tf.session 设置如下。

config = tf.ConfigProto()
config.intra_op_parallelism_threads = 8
config.inter_op_parallelism_threads = 1
tf.Session(config=config)

同时设置

OMP_NUM_THREADS=8, 
KMP_BLOCKTIME=1; 
KMP_AFFINITY='granularity=fine,verbose,compact,1,0'; 
KMP_SETTINGS=1

但是可能会导致Cpu超标,Golang进程创建87个线程。 我的设置有问题吗?

这是来自 OMP 的日志。

OMP: Info #212: KMP_AFFINITY: decoding x2APIC ids.
OMP: Info #210: KMP_AFFINITY: Affinity capable, using global cpuid leaf 11 info
OMP: Info #154: KMP_AFFINITY: Initial OS proc set respected: 2,4,5,12,17,27,32,47
OMP: Info #156: KMP_AFFINITY: 8 available OS procs
OMP: Info #157: KMP_AFFINITY: Uniform topology
OMP: Info #179: KMP_AFFINITY: 2 packages x 4 cores/pkg x 1 threads/core (8 total cores)
OMP: Info #214: KMP_AFFINITY: OS proc to physical thread map:
OMP: Info #171: KMP_AFFINITY: OS proc 4 maps to package 0 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 32 maps to package 0 core 2
OMP: Info #171: KMP_AFFINITY: OS proc 2 maps to package 0 core 5
OMP: Info #171: KMP_AFFINITY: OS proc 12 maps to package 0 core 8
OMP: Info #171: KMP_AFFINITY: OS proc 5 maps to package 1 core 1
OMP: Info #171: KMP_AFFINITY: OS proc 27 maps to package 1 core 5
OMP: Info #171: KMP_AFFINITY: OS proc 17 maps to package 1 core 9
OMP: Info #171: KMP_AFFINITY: OS proc 47 maps to package 1 core 11
OMP: Info #250: KMP_AFFINITY: pid 832 tid 946 thread 0 bound to OS proc set 4
OMP: Info #250: KMP_AFFINITY: pid 832 tid 946 thread 1 bound to OS proc set 5
OMP: Info #250: KMP_AFFINITY: pid 832 tid 945 thread 2 bound to OS proc set 32
OMP: Info #250: KMP_AFFINITY: pid 832 tid 1583 thread 3 bound to OS proc set 27
OMP: Info #250: KMP_AFFINITY: pid 832 tid 1584 thread 4 bound to OS proc set 2
OMP: Info #250: KMP_AFFINITY: pid 832 tid 1585 thread 5 bound to OS proc set 17
OMP: Info #250: KMP_AFFINITY: pid 832 tid 1586 thread 6 bound to OS proc set 12
OMP: Info #250: KMP_AFFINITY: pid 832 tid 1587 thread 7 bound to OS proc set 47
OMP: Info #250: KMP_AFFINITY: pid 832 tid 1590 thread 10 bound to OS proc set 32
OMP: Info #250: KMP_AFFINITY: pid 832 tid 1589 thread 9 bound to OS proc set 5
OMP: Info #250: KMP_AFFINITY: pid 832 tid 1591 thread 11 bound to OS proc set 27
OMP: Info #250: KMP_AFFINITY: pid 832 tid 1588 thread 8 bound to OS proc set 4
OMP: Info #250: KMP_AFFINITY: pid 832 tid 2120 thread 13 bound to OS proc set 17
OMP: Info #250: KMP_AFFINITY: pid 832 tid 2122 thread 15 bound to OS proc set 47
OMP: Info #250: KMP_AFFINITY: pid 832 tid 2123 thread 16 bound to OS proc set 4
OMP: Info #250: KMP_AFFINITY: pid 832 tid 2121 thread 14 bound to OS proc set 12
OMP: Info #250: KMP_AFFINITY: pid 832 tid 2119 thread 12 bound to OS proc set 2

根据可用信息,您似乎拥有 16 (8x2) 个物理内核和 32 (8x2x2) 个逻辑内核。推荐设置 'intra_op_parallelism_threads' 等于物理内核,'inter_op_parallelism_threads' 等于插槽数。

在你的情况下,假设一次有 8 个模型,我建议你尝试以下配置。

config = tf.ConfigProto()
config.intra_op_parallelism_threads = 2
config.inter_op_parallelism_threads = 2
tf.Session(config=config)

OMP_NUM_THREADS=2, 
KMP_BLOCKTIME=1; 
KMP_AFFINITY='granularity=fine,verbose,compact,1,0'; 
KMP_SETTINGS=1

也可以尝试 'config.intra_op_parallelism_threads=1' 和 'OMP_NUM_THREADS=1'。

更多详情可以参考https://software.intel.com/en-us/articles/maximize-tensorflow-performance-on-cpu-considerations-and-recommendations-for-inference

希望这对您有所帮助。