Eigen3 微小矩阵差异但开销大
Eigen3 tiny matrices difference but large overhead
我想使用 eigen3 库计算两个 640x512 小矩阵之间的差异,但我最终遇到了一些高计算延迟(在 Intel Xeon 16 核 @ 2.4GHz 上为 45 毫秒)。
我可以问你一些提示来改善这个异常的计算时间吗?
下面是相关的代码片段:
static inline void tsnorm(stTime *ts)
{
while (ts->tv_nsec >= NSEC_PER_SEC)
{
ts->tv_nsec -= NSEC_PER_SEC;
ts->tv_sec++;
}
}
const unsigned short usRawFrameRows = 640;
const unsigned short usRawFrameCols = 512;
using pixType = unsigned short;
using pixDynMat = Matrix<pixType, Dynamic, Dynamic, RowMajor>;
pixDynMat biasFrame = pixDynMat::Zero(usRawFrameRows, usRawFrameCols);
pixType *myRawFrame = new pixType[usRawFrameRows * usRawFrameCols];
struct timespec tBeforeProcessFrameCall, tAfterProcessFrameCall;
clock_gettime(CLOCK_MONOTONIC_RAW, &tBeforeProcessFrameCall);
tsnorm(&tBeforeProcessFrameCall);
// Substract the bias from the current raw frame
MatrixXd calFrame = Map<pixDynMat>(myRawFrame, usRawFrameRows, usRawFrameCols).cast<double>()
- biasFrame.cast<double>();
clock_gettime(CLOCK_MONOTONIC_RAW, &tAfterProcessFrameCall);
tsnorm(&tAfterProcessFrameCall);
cout << " PHI processFrame overhead (ms) = " << (tAfterProcessFrameCall.tv_nsec - tBeforeProcessFrameCall.tv_nsec)/1e6 << endl;
干杯!
西尔万
我编译了你的代码(i7-9700K):
Compiler: g++ -O3 -march=native test.cpp -o testbin
====================================================
PHI processFrame overhead (ms) = 0.952253
然而,没有优化:
Compiler: g++ test.cpp -o testbin
====================================================
PHI processFrame overhead (ms) = 20.1365
我怀疑您缺少编译器优化。您可以尝试在启用优化的情况下进行编译。根据 FAQ 页面,这很容易让您获得十倍或更多的收益(参见 http://eigen.tuxfamily.org/index.php?title=FAQ#Optimization)。
我想使用 eigen3 库计算两个 640x512 小矩阵之间的差异,但我最终遇到了一些高计算延迟(在 Intel Xeon 16 核 @ 2.4GHz 上为 45 毫秒)。 我可以问你一些提示来改善这个异常的计算时间吗? 下面是相关的代码片段:
static inline void tsnorm(stTime *ts)
{
while (ts->tv_nsec >= NSEC_PER_SEC)
{
ts->tv_nsec -= NSEC_PER_SEC;
ts->tv_sec++;
}
}
const unsigned short usRawFrameRows = 640;
const unsigned short usRawFrameCols = 512;
using pixType = unsigned short;
using pixDynMat = Matrix<pixType, Dynamic, Dynamic, RowMajor>;
pixDynMat biasFrame = pixDynMat::Zero(usRawFrameRows, usRawFrameCols);
pixType *myRawFrame = new pixType[usRawFrameRows * usRawFrameCols];
struct timespec tBeforeProcessFrameCall, tAfterProcessFrameCall;
clock_gettime(CLOCK_MONOTONIC_RAW, &tBeforeProcessFrameCall);
tsnorm(&tBeforeProcessFrameCall);
// Substract the bias from the current raw frame
MatrixXd calFrame = Map<pixDynMat>(myRawFrame, usRawFrameRows, usRawFrameCols).cast<double>()
- biasFrame.cast<double>();
clock_gettime(CLOCK_MONOTONIC_RAW, &tAfterProcessFrameCall);
tsnorm(&tAfterProcessFrameCall);
cout << " PHI processFrame overhead (ms) = " << (tAfterProcessFrameCall.tv_nsec - tBeforeProcessFrameCall.tv_nsec)/1e6 << endl;
干杯!
西尔万
我编译了你的代码(i7-9700K):
Compiler: g++ -O3 -march=native test.cpp -o testbin
====================================================
PHI processFrame overhead (ms) = 0.952253
然而,没有优化:
Compiler: g++ test.cpp -o testbin
====================================================
PHI processFrame overhead (ms) = 20.1365
我怀疑您缺少编译器优化。您可以尝试在启用优化的情况下进行编译。根据 FAQ 页面,这很容易让您获得十倍或更多的收益(参见 http://eigen.tuxfamily.org/index.php?title=FAQ#Optimization)。