解读tensorflow benchmark工具的结果
Interpreting results of tensorflow benchmark tool
Tensorflow 的基准测试工具很少:
对于.pb model and for .tflite model
我对 .pb 基准测试工具的参数有几个问题:
num_threads
与单线程实验的并行运行数有关,还是与tensorflow使用的内部线程有关?
- 在为桌面构建工具时是否可以使用 GPU,即不适用于移动设备?如果是这样,如何确保不使用GPU?
还有一些关于结果解释的问题:
- 结果输出中的
count
是什么? Timings (microseconds): count=
与 --max_num_runs
参数有什么关系?
示例:
Run --num_threads=-1 --max_num_runs=1000:
2019-03-20 14:30:33.253584: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=1000 first=3608 curr=3873 min=3566 max=8009 avg=3766.49 std=202
2019-03-20 14:30:33.253584: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=1000 curr=3301344(all same)
2019-03-20 14:30:33.253591: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
2019-03-20 14:30:33.253597: I tensorflow/core/util/stat_summarizer.cc:85]
2019-03-20 14:30:33.378352: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
2019-03-20 14:30:33.378390: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 46.30B
Run --num_threads=1 --max_num_runs=1000:
2019-03-20 14:32:25.591915: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=1000 first=7502 curr=7543 min=7495 max=7716 avg=7607.22 std=34
2019-03-20 14:32:25.591934: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=1000 curr=3301344(all same)
2019-03-20 14:32:25.591952: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
2019-03-20 14:32:25.591970: I tensorflow/core/util/stat_summarizer.cc:85]
2019-03-20 14:32:25.805970: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
2019-03-20 14:32:25.806007: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 15.46B
Run --num_threads=-1 --max_num_runs=10000:
2019-03-20 14:38:48.045824: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=3570 first=3961 curr=3899 min=3558 max=6997 avg=3841.2 std=175
2019-03-20 14:38:48.045829: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=3570 curr=3301344(all same)
2019-03-20 14:38:48.045833: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
2019-03-20 14:38:48.045837: I tensorflow/core/util/stat_summarizer.cc:85]
2019-03-20 14:38:48.169368: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
2019-03-20 14:38:48.169412: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 48.66B
Run --num_threads=1 --max_num_runs=10000:
2019-03-20 14:35:50.826722: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=1254 first=7496 curr=7518 min=7475 max=7838 avg=7577.23 std=50
2019-03-20 14:35:50.826735: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=1254 curr=3301344(all same)
2019-03-20 14:35:50.826746: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
2019-03-20 14:35:50.826757: I tensorflow/core/util/stat_summarizer.cc:85]
2019-03-20 14:35:51.053143: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
2019-03-20 14:35:51.053180: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 15.55B
即当使用 --max_num_runs=10000
计数时 count=3570
和 count=1254
是什么意思?
对于 .tflite
基准工具:
--num_threads=1 --num_runs=10000
Initialized session in 0.682ms
Running benchmark for at least 1 iterations and at least 0.5 seconds
count=54 first=23463 curr=8019 min=7911 max=23463 avg=9268.5 std=2995
Running benchmark for at least 1000 iterations and at least 1 seconds
count=1000 first=8022 curr=6703 min=6613 max=10333 avg=6766.23 std=337
Average inference timings in us: Warmup: 9268.5, Init: 682, no stats: 6766.23
no stats: 6766.23
是什么意思?
深入研究代码后,我发现了以下内容(所有时间均以微秒为单位):
count
:实际运行s 个数
first
: 第一次迭代的时间
curr
: 上次迭代的时间
min
: 迭代花费的最短时间
max
: 迭代花费的最长时间
avg
:一次迭代的平均时间
std
:所有 运行s 时间的标准偏差
Warmup
:热身运行平均
Init
:启动时间(应始终与 Initialized session in
相同)
no stats
: 是非常不好命名的平均运行时间(匹配上一行的avg=
)
num_threads
:这用于设置intra_op_parallelism_threads
和inter_op_parallelism_threads
(更多信息here)
相关文件(链接到正确的行)是:
stats_calculator.h
- 实际跟踪运行次 的代码
benchmark_model.cc
(tflite) - 奇怪的 "no stats" 名称
benchmark_model.cc
(pb) - 使用 num_threads
我不太确定使用 GPU 还是不使用 GPU。如果您使用 freeze_graph
导出 .pb
文件,那么它将存储图中每个节点的设备。您可以在导出前使用设备放置来执行此操作。如果您需要更改它,可以尝试设置环境变量CUDA_VISIBLE_DEVICES=""
以确保未使用GPU。
Tensorflow 的基准测试工具很少:
对于.pb model and for .tflite model
我对 .pb 基准测试工具的参数有几个问题:
num_threads
与单线程实验的并行运行数有关,还是与tensorflow使用的内部线程有关?- 在为桌面构建工具时是否可以使用 GPU,即不适用于移动设备?如果是这样,如何确保不使用GPU?
还有一些关于结果解释的问题:
- 结果输出中的
count
是什么?Timings (microseconds): count=
与--max_num_runs
参数有什么关系?
示例:
Run --num_threads=-1 --max_num_runs=1000:
2019-03-20 14:30:33.253584: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=1000 first=3608 curr=3873 min=3566 max=8009 avg=3766.49 std=202
2019-03-20 14:30:33.253584: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=1000 curr=3301344(all same)
2019-03-20 14:30:33.253591: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
2019-03-20 14:30:33.253597: I tensorflow/core/util/stat_summarizer.cc:85]
2019-03-20 14:30:33.378352: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
2019-03-20 14:30:33.378390: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 46.30B
Run --num_threads=1 --max_num_runs=1000:
2019-03-20 14:32:25.591915: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=1000 first=7502 curr=7543 min=7495 max=7716 avg=7607.22 std=34
2019-03-20 14:32:25.591934: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=1000 curr=3301344(all same)
2019-03-20 14:32:25.591952: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
2019-03-20 14:32:25.591970: I tensorflow/core/util/stat_summarizer.cc:85]
2019-03-20 14:32:25.805970: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
2019-03-20 14:32:25.806007: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 15.46B
Run --num_threads=-1 --max_num_runs=10000:
2019-03-20 14:38:48.045824: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=3570 first=3961 curr=3899 min=3558 max=6997 avg=3841.2 std=175
2019-03-20 14:38:48.045829: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=3570 curr=3301344(all same)
2019-03-20 14:38:48.045833: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
2019-03-20 14:38:48.045837: I tensorflow/core/util/stat_summarizer.cc:85]
2019-03-20 14:38:48.169368: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
2019-03-20 14:38:48.169412: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 48.66B
Run --num_threads=1 --max_num_runs=10000:
2019-03-20 14:35:50.826722: I tensorflow/core/util/stat_summarizer.cc:85] Timings (microseconds): count=1254 first=7496 curr=7518 min=7475 max=7838 avg=7577.23 std=50
2019-03-20 14:35:50.826735: I tensorflow/core/util/stat_summarizer.cc:85] Memory (bytes): count=1254 curr=3301344(all same)
2019-03-20 14:35:50.826746: I tensorflow/core/util/stat_summarizer.cc:85] 207 nodes observed
2019-03-20 14:35:50.826757: I tensorflow/core/util/stat_summarizer.cc:85]
2019-03-20 14:35:51.053143: I tensorflow/tools/benchmark/benchmark_model.cc:636] FLOPs estimate: 116.65M
2019-03-20 14:35:51.053180: I tensorflow/tools/benchmark/benchmark_model.cc:638] FLOPs/second: 15.55B
即当使用 --max_num_runs=10000
计数时 count=3570
和 count=1254
是什么意思?
对于 .tflite
基准工具:
--num_threads=1 --num_runs=10000
Initialized session in 0.682ms
Running benchmark for at least 1 iterations and at least 0.5 seconds
count=54 first=23463 curr=8019 min=7911 max=23463 avg=9268.5 std=2995
Running benchmark for at least 1000 iterations and at least 1 seconds
count=1000 first=8022 curr=6703 min=6613 max=10333 avg=6766.23 std=337
Average inference timings in us: Warmup: 9268.5, Init: 682, no stats: 6766.23
no stats: 6766.23
是什么意思?
深入研究代码后,我发现了以下内容(所有时间均以微秒为单位):
count
:实际运行s 个数
first
: 第一次迭代的时间curr
: 上次迭代的时间min
: 迭代花费的最短时间max
: 迭代花费的最长时间avg
:一次迭代的平均时间std
:所有 运行s 时间的标准偏差
Warmup
:热身运行平均Init
:启动时间(应始终与Initialized session in
相同)no stats
: 是非常不好命名的平均运行时间(匹配上一行的avg=
)num_threads
:这用于设置intra_op_parallelism_threads
和inter_op_parallelism_threads
(更多信息here)
相关文件(链接到正确的行)是:
stats_calculator.h
- 实际跟踪运行次 的代码
benchmark_model.cc
(tflite) - 奇怪的 "no stats" 名称benchmark_model.cc
(pb) - 使用num_threads
我不太确定使用 GPU 还是不使用 GPU。如果您使用 freeze_graph
导出 .pb
文件,那么它将存储图中每个节点的设备。您可以在导出前使用设备放置来执行此操作。如果您需要更改它,可以尝试设置环境变量CUDA_VISIBLE_DEVICES=""
以确保未使用GPU。