TensorRT "floating-point 16" 精度模式在 Jetson TX2 上是不确定的吗？

Question

我正在使用 TensorRT FP16 精度模式 来优化我的深度学习模型。我在 Jetson TX2 上使用了这个优化模型。在测试模型时，我观察到 TensorRT 推理引擎 不确定 。换句话说，我的优化模型为相同的输入图像提供了 40 到 120 FPS 之间的不同 FPS 值。

当我看到 this 关于 CUDA 的评论时，我开始认为非确定性的来源是浮点运算：

"If your code uses floating-point atomics, results may differ from run to run because floating-point operations are generally not associative, and the order in which data enters a computation (e.g. a sum) is non-deterministic when atomics are used."

FP16、FP32、INT8等精度类型会影响TensorRT的确定性吗？或者什么？

你有什么想法吗？

此致。

Answer 1

我通过更改用于测量延迟的函数 clock() 解决了这个问题。 clock() 函数正在测量 CPU 时间延迟，但我想做的是测量实时延迟。现在我正在使用 std::chrono 来测量延迟。现在推理结果是延迟确定性的。

错了，(clock())

int main ()
{
  clock_t t;
  int f;
  t = clock();
  inferenceEngine(); // Tahmin yapılıyor
  t = clock() - t;
  printf ("It took me %d clicks (%f seconds).\n",t,((float)t)/CLOCKS_PER_SEC);
  return 0;
}

像这样使用 Cuda 事件，(CudaEvent)

cudaEvent_t start, stop;
cudaEventCreate(&start);
cudaEventCreate(&stop);

cudaEventRecord(start);
inferenceEngine(); // Do the inference

cudaEventRecord(stop);

cudaEventSynchronize(stop);
float milliseconds = 0;

cudaEventElapsedTime(&milliseconds, start, stop);

像这样使用 chrono：(std::chrono)

#include <iostream>
#include <chrono>
#include <ctime>
int main()
{
  auto start = std::chrono::system_clock::now();
  inferenceEngine(); // Do the inference
  auto end = std::chrono::system_clock::now();

  std::chrono::duration<double> elapsed_seconds = end-start;
  std::time_t end_time = std::chrono::system_clock::to_time_t(end);

  std::cout << "finished computation at " << std::ctime(&end_time)
            << "elapsed time: " << elapsed_seconds.count() << "s\n";
}

TensorRT "floating-point 16" 精度模式在 Jetson TX2 上是不确定的吗？

Is TensorRT "floating-point 16" precision mode non-deterministic on Jetson TX2?

deterministic

non-deterministic

tensorrt

nvidia-jetson

half-precision-float