使用 OpenCV Cuda ORB 特征检测器
Using OpenCV Cuda ORB feature detector
我有一个应用程序,我正在接收图像流,我想在其中监视一组 ROI 内检测到的特征。这是使用 ORB 检测器完成的。在第一张图片中,我使用检测器为给定的 ROI 找到 "reference" 个关键点和描述符。对于后续图像,我找到了相同 ROI 的 "test" 个关键点和描述符。然后我使用 knn 匹配器来查找引用和测试描述符之间的匹配项。最后,我尝试找到 "best" 匹配项,将关联的关键点添加到 "matched keypoints" 集合,然后计算 "match intensity"。此匹配强度旨在指示在参考图像中找到的关键点与测试图像中的关键点匹配的程度。
我有几个问题:
1 - 这是对特征检测器的有效使用吗?我知道一个简单的模板匹配可能会给我类似的结果,但我希望避免光线轻微变化的问题。
2 - 我是否正确筛选了 "good" 个匹配项,然后我是否获得了该匹配项的正确关联关键点?
3 - 我的代码似乎按原样工作,但是,如果我尝试使用流移动到 OpenCV 调用的异步版本,我会得到一个异常:
"invalid resource handle in function cv::cuda::GpuMat::setTo" 发生在对 ORB_Impl::buildScalePyramids 的调用中(从 ORB_Impl::detectAndComputeAsync 调用)。请参阅下面我的 "NewFrame" 函数的异步版本。这让我觉得我没有正确设置所有这些。
这是我的代码:
void Matcher::Matcher()
{
// create ORB detector and descriptor matcher
m_b = cuda::ORB::create(500, 1.2f, 8, 31, 0, 2, 0, 31, 20, true);
m_descriptorMatcher = cv::cuda::DescriptorMatcher::createBFMatcher(cv::NORM_HAMMING);
}
void Matcher::Configure(int imageWidth, int imageHeight, int roiX, int roiY, int roiW, int roiH)
{
// set member variables
m_imageWidth = imageWidth;
m_imageHeight = imageHeight;
m_roiX = roiX;
m_roiY = roiY;
m_roiW = roiW;
m_roiH = roiH;
m_GpuRefSet = false; // set flag indicating reference not yet set
// create mask for specified ROI
m_mask = GpuMat(imageHeight,imageWidth, CV_8UC1, Scalar::all(0));
cv::Rect rect = cv::Rect(m_roiX, m_roiY, m_roiW, m_roiH);
m_mask(rect).setTo(Scalar::all(255));
}
double Matcher::NewFrame(void *pImagedata)
{
// pImagedata = pointer to BGRA byte array
// m_imageHeight and m_imageWidth have already been set
// m_b is a pointer to the ORB detector
if (!m_GpuRefSet)
{ // 1st time through (after call to Matcher::Configure), set reference keypoints and descriptors
cv::cuda::GpuMat mat1(m_imageHeight, m_imageWidth, CV_8UC4, pImagedata); // put image data into GpuMat
cv::cuda::cvtColor(mat1, m_refImage, CV_BGRA2GRAY); // convert to grayscale as required by ORB
m_keyRef.clear(); // clear the vector<KeyPoint>, keypoint vector for reference image
m_b->detectAndCompute(m_refImage, m_mask, m_keyRef, m_descRef, false); // detect keypoints and compute descriptors
m_GpuRefSet = true;
}
cv::cuda::GpuMat mat2(m_imageHeight, m_imageWidth, CV_8UC4, pImagedata); // put image data into GpuMat
cv::cuda::cvtColor(mat2, m_testImage, CV_BGRA2GRAY, 0); // convert to grayscale as required by ORB
m_keyTest.clear(); // clear vector<KeyPoint>, keypoint vector for test image
m_b->detectAndCompute(m_testImage, m_mask, m_keyTest, m_descTest, false); // detect keypoints and compute descriptors
double value = 0.0f; // used to store return value ("match intensity")
// calculate best match for each descriptor
if (m_descTest.rows > 0)
{
m_goodKeypoints.clear(); // clear vector of "good" KeyPoints, vector<KeyPoint>
m_descriptorMatcher->knnMatch(m_descTest, m_descRef, m_matches, 2, noArray()); // find matches
// examine all matches, and collect the KeyPoints whose match distance mets given criteria
for (int i = 0; i<m_matches.size(); i++){
if (m_matches[i][0].distance < m_matches[i][1].distance * m_nnr){ // m_nnr = nearest neighbor ratio (typically 0.6 - 0.8)
m_goodKeypoints.push_back(m_keyRef.at(m_matches[i][0].trainIdx)); // not sure if getting the correct keypoint here
}
}
// calculate "match intensity", i.e. percent of the keypoints found in the reference image that are also in the test image
value = ((double)m_goodKeypoints.size()) / ((double)m_keyRef.size());
}
else
{
value = 0.0f;
}
return value;
}
下面是失败的 NewFrame 函数的 stream/async 版本:
double Matcher::NewFrame(void *pImagedata)
{
if (m_b.empty()) return 0.0f;
if (!m_GpuRefSet)
{
try
{
cv::cuda::GpuMat mat1(m_imageHeight, m_imageWidth, CV_8UC4, pImagedata);
cv::cuda::cvtColor(mat1, m_refImage, CV_BGRA2GRAY);
m_keyRef.clear();
m_b->detectAndComputeAsync(m_refImage, m_mask, m_keyRef, m_descRef, false,m_stream); // FAILS HERE
m_stream.waitForCompletion();
m_GpuRefSet = true;
}
catch (Exception e)
{
string msg = e.msg;
}
}
cv::cuda::GpuMat mat2(m_imageHeight, m_imageWidth, CV_8UC4, pImagedata);
cv::cuda::cvtColor(mat2, m_testImage, CV_BGRA2GRAY, 0, m_stream);
m_keyTest.clear();
m_b->detectAndComputeAsync(m_testImage, m_mask, m_keyTest, m_descTest, false, m_stream);
m_stream.waitForCompletion();
double value = 0.0f;
// calculate best match for each descriptor
if (m_descTest.rows > 0)
{
m_goodKeypoints.clear();
m_descriptorMatcher->knnMatchAsync(m_descTest, m_descRef, m_matches, 2, noArray(), m_stream);
m_stream.waitForCompletion();
for (int i = 0; i<m_matches.size(); i++){
if (m_matches[i][0].distance < m_matches[i][1].distance * m_nnr) // m_nnr = nearest neighbor ratio
{
m_goodKeypoints.push_back(m_keyRef.at(m_matches[i][0].trainIdx));
}
}
value = ((double)m_goodKeypoints.size()) / ((double)m_keyRef.size());
}
else
{
value = 0.0f;
}
if (value > 1.0f) value = 1.0f;
return value;
}
任何 suggestions/advice 将不胜感激。
谢谢!!
经过一些试验,我确信这确实是对 ORB 检测器的合理使用,并且我使用最近邻比方法对 "goodness" 进行的测试似乎也有效。这回答了上面的问题 #1 和 #2。
关于问题 #3,我确实有了一些发现,这些发现为我清理了很多东西。
首先,事实证明我对 cv::cuda::Stream 和 cpu 线程不够小心。虽然我确信这对很多人来说是显而易见的,并且在 OpenCV 文档中提到过,但是放在特定 cv::cuda::Stream 上的任何内容都应该从同一个 cpu 线程完成。不这样做不一定会产生异常,但会产生不确定的行为,其中可能包括异常。
其次,对我来说,事实证明使用异步版本的 detectAndCompute 和 knnMatch 在多线程中更可靠。这 似乎 与以下事实有关:异步版本使用所有基于 GPU 的参数,而非异步版本具有基于 CPU 的向量参数。异步和非异步版本似乎都适用于我编写的简单的单线程测试应用程序。然而,我的实际应用程序在其他线程上有其他 CUDA 内核和 CUDA 视频解码器 运行,所以 GPU 上的东西很拥挤。
无论如何,这是我如何进行异步函数调用的版本,它为我清理了所有内容。它演示了 Async/Stream 版本的 ORB 检测器和描述符匹配器的使用。传递给它的 cv::cuda::Stream 可以是 cv::cuda::Stream::NullStream() 或您创建的 cv::cuda::Stream。只要记住在使用它的同一个 cpu 线程上创建流。
我仍然对改进建议感兴趣,但以下似乎有效。
orb = cuda::ORB::create(500, 1.2f, 8, 31, 0, 2, 0, 31, 20, true);
matcher = cv::cuda::DescriptorMatcher::createBFMatcher(cv::NORM_HAMMING);
// process 1st image
GpuMat imgGray1; // load this with your grayscale image
GpuMat keys1; // this holds the keys detected
GpuMat desc1; // this holds the descriptors for the detected keypoints
GpuMat mask1; // this holds any mask you may want to use, or can be replace by noArray() in the call below if no mask is needed
vector<KeyPoint> cpuKeys1; // holds keypoints downloaded from gpu
//ADD CODE TO LOAD imgGray1
orb->detectAndComputeAsync(imgGray1, mask1, keys1, desc1, false, m_stream);
stream.waitForCompletion();
orb->convert(keys1, cpuKeys1); // download keys to cpu if needed for anything...like displaying or whatever
// process 2nd image
GpuMat imgGray2; // load this with your grayscale image
GpuMat keys2; // this holds the keys detected
GpuMat desc2; // this holds the descriptors for the detected keypoints
GpuMat mask2; // this holds any mask you may want to use, or can be replace by noArray() in the call below if no mask is needed
vector<KeyPoint> cpuKeys2; // holds keypoints downloaded from gpu
//ADD CODE TO LOAD imgGray2
orb->detectAndComputeAsync(imgGray2, mask2, keys2, desc2, false, m_stream);
stream.waitForCompletion();
orb->convert(keys2, cpuKeys2); // download keys to cpu if needed for anything...like displaying or whatever
if (desc2.rows > 0)
{
vector<vector<DMatch>> cpuKnnMatches;
GpuMat gpuKnnMatches; // holds matches on gpu
matcher->knnMatchAsync(desc2, desc1, gpuKnnMatches, 2, noArray(), *stream); // find matches
stream.waitForCompletion();
matcher->knnMatchConvert(gpuKnnMatches, cpuKnnMatches); // download matches from gpu and put into vector<vector<DMatch>> form on cpu
vector<DMatch> matches; // vector of good matches between tested images
for (std::vector<std::vector<cv::DMatch> >::const_iterator it = cpuKnnMatches.begin(); it != cpuKnnMatches.end(); ++it) {
if (it->size() > 1 && (*it)[0].distance / (*it)[1].distance < m_nnr) { // use Nearest-Neighbor Ratio to determine "good" matches
DMatch m = (*it)[0];
matches.push_back(m); // save good matches here
}
}
}
}
我有一个应用程序,我正在接收图像流,我想在其中监视一组 ROI 内检测到的特征。这是使用 ORB 检测器完成的。在第一张图片中,我使用检测器为给定的 ROI 找到 "reference" 个关键点和描述符。对于后续图像,我找到了相同 ROI 的 "test" 个关键点和描述符。然后我使用 knn 匹配器来查找引用和测试描述符之间的匹配项。最后,我尝试找到 "best" 匹配项,将关联的关键点添加到 "matched keypoints" 集合,然后计算 "match intensity"。此匹配强度旨在指示在参考图像中找到的关键点与测试图像中的关键点匹配的程度。
我有几个问题:
1 - 这是对特征检测器的有效使用吗?我知道一个简单的模板匹配可能会给我类似的结果,但我希望避免光线轻微变化的问题。
2 - 我是否正确筛选了 "good" 个匹配项,然后我是否获得了该匹配项的正确关联关键点?
3 - 我的代码似乎按原样工作,但是,如果我尝试使用流移动到 OpenCV 调用的异步版本,我会得到一个异常: "invalid resource handle in function cv::cuda::GpuMat::setTo" 发生在对 ORB_Impl::buildScalePyramids 的调用中(从 ORB_Impl::detectAndComputeAsync 调用)。请参阅下面我的 "NewFrame" 函数的异步版本。这让我觉得我没有正确设置所有这些。
这是我的代码:
void Matcher::Matcher()
{
// create ORB detector and descriptor matcher
m_b = cuda::ORB::create(500, 1.2f, 8, 31, 0, 2, 0, 31, 20, true);
m_descriptorMatcher = cv::cuda::DescriptorMatcher::createBFMatcher(cv::NORM_HAMMING);
}
void Matcher::Configure(int imageWidth, int imageHeight, int roiX, int roiY, int roiW, int roiH)
{
// set member variables
m_imageWidth = imageWidth;
m_imageHeight = imageHeight;
m_roiX = roiX;
m_roiY = roiY;
m_roiW = roiW;
m_roiH = roiH;
m_GpuRefSet = false; // set flag indicating reference not yet set
// create mask for specified ROI
m_mask = GpuMat(imageHeight,imageWidth, CV_8UC1, Scalar::all(0));
cv::Rect rect = cv::Rect(m_roiX, m_roiY, m_roiW, m_roiH);
m_mask(rect).setTo(Scalar::all(255));
}
double Matcher::NewFrame(void *pImagedata)
{
// pImagedata = pointer to BGRA byte array
// m_imageHeight and m_imageWidth have already been set
// m_b is a pointer to the ORB detector
if (!m_GpuRefSet)
{ // 1st time through (after call to Matcher::Configure), set reference keypoints and descriptors
cv::cuda::GpuMat mat1(m_imageHeight, m_imageWidth, CV_8UC4, pImagedata); // put image data into GpuMat
cv::cuda::cvtColor(mat1, m_refImage, CV_BGRA2GRAY); // convert to grayscale as required by ORB
m_keyRef.clear(); // clear the vector<KeyPoint>, keypoint vector for reference image
m_b->detectAndCompute(m_refImage, m_mask, m_keyRef, m_descRef, false); // detect keypoints and compute descriptors
m_GpuRefSet = true;
}
cv::cuda::GpuMat mat2(m_imageHeight, m_imageWidth, CV_8UC4, pImagedata); // put image data into GpuMat
cv::cuda::cvtColor(mat2, m_testImage, CV_BGRA2GRAY, 0); // convert to grayscale as required by ORB
m_keyTest.clear(); // clear vector<KeyPoint>, keypoint vector for test image
m_b->detectAndCompute(m_testImage, m_mask, m_keyTest, m_descTest, false); // detect keypoints and compute descriptors
double value = 0.0f; // used to store return value ("match intensity")
// calculate best match for each descriptor
if (m_descTest.rows > 0)
{
m_goodKeypoints.clear(); // clear vector of "good" KeyPoints, vector<KeyPoint>
m_descriptorMatcher->knnMatch(m_descTest, m_descRef, m_matches, 2, noArray()); // find matches
// examine all matches, and collect the KeyPoints whose match distance mets given criteria
for (int i = 0; i<m_matches.size(); i++){
if (m_matches[i][0].distance < m_matches[i][1].distance * m_nnr){ // m_nnr = nearest neighbor ratio (typically 0.6 - 0.8)
m_goodKeypoints.push_back(m_keyRef.at(m_matches[i][0].trainIdx)); // not sure if getting the correct keypoint here
}
}
// calculate "match intensity", i.e. percent of the keypoints found in the reference image that are also in the test image
value = ((double)m_goodKeypoints.size()) / ((double)m_keyRef.size());
}
else
{
value = 0.0f;
}
return value;
}
下面是失败的 NewFrame 函数的 stream/async 版本:
double Matcher::NewFrame(void *pImagedata)
{
if (m_b.empty()) return 0.0f;
if (!m_GpuRefSet)
{
try
{
cv::cuda::GpuMat mat1(m_imageHeight, m_imageWidth, CV_8UC4, pImagedata);
cv::cuda::cvtColor(mat1, m_refImage, CV_BGRA2GRAY);
m_keyRef.clear();
m_b->detectAndComputeAsync(m_refImage, m_mask, m_keyRef, m_descRef, false,m_stream); // FAILS HERE
m_stream.waitForCompletion();
m_GpuRefSet = true;
}
catch (Exception e)
{
string msg = e.msg;
}
}
cv::cuda::GpuMat mat2(m_imageHeight, m_imageWidth, CV_8UC4, pImagedata);
cv::cuda::cvtColor(mat2, m_testImage, CV_BGRA2GRAY, 0, m_stream);
m_keyTest.clear();
m_b->detectAndComputeAsync(m_testImage, m_mask, m_keyTest, m_descTest, false, m_stream);
m_stream.waitForCompletion();
double value = 0.0f;
// calculate best match for each descriptor
if (m_descTest.rows > 0)
{
m_goodKeypoints.clear();
m_descriptorMatcher->knnMatchAsync(m_descTest, m_descRef, m_matches, 2, noArray(), m_stream);
m_stream.waitForCompletion();
for (int i = 0; i<m_matches.size(); i++){
if (m_matches[i][0].distance < m_matches[i][1].distance * m_nnr) // m_nnr = nearest neighbor ratio
{
m_goodKeypoints.push_back(m_keyRef.at(m_matches[i][0].trainIdx));
}
}
value = ((double)m_goodKeypoints.size()) / ((double)m_keyRef.size());
}
else
{
value = 0.0f;
}
if (value > 1.0f) value = 1.0f;
return value;
}
任何 suggestions/advice 将不胜感激。
谢谢!!
经过一些试验,我确信这确实是对 ORB 检测器的合理使用,并且我使用最近邻比方法对 "goodness" 进行的测试似乎也有效。这回答了上面的问题 #1 和 #2。
关于问题 #3,我确实有了一些发现,这些发现为我清理了很多东西。
首先,事实证明我对 cv::cuda::Stream 和 cpu 线程不够小心。虽然我确信这对很多人来说是显而易见的,并且在 OpenCV 文档中提到过,但是放在特定 cv::cuda::Stream 上的任何内容都应该从同一个 cpu 线程完成。不这样做不一定会产生异常,但会产生不确定的行为,其中可能包括异常。
其次,对我来说,事实证明使用异步版本的 detectAndCompute 和 knnMatch 在多线程中更可靠。这 似乎 与以下事实有关:异步版本使用所有基于 GPU 的参数,而非异步版本具有基于 CPU 的向量参数。异步和非异步版本似乎都适用于我编写的简单的单线程测试应用程序。然而,我的实际应用程序在其他线程上有其他 CUDA 内核和 CUDA 视频解码器 运行,所以 GPU 上的东西很拥挤。
无论如何,这是我如何进行异步函数调用的版本,它为我清理了所有内容。它演示了 Async/Stream 版本的 ORB 检测器和描述符匹配器的使用。传递给它的 cv::cuda::Stream 可以是 cv::cuda::Stream::NullStream() 或您创建的 cv::cuda::Stream。只要记住在使用它的同一个 cpu 线程上创建流。
我仍然对改进建议感兴趣,但以下似乎有效。
orb = cuda::ORB::create(500, 1.2f, 8, 31, 0, 2, 0, 31, 20, true);
matcher = cv::cuda::DescriptorMatcher::createBFMatcher(cv::NORM_HAMMING);
// process 1st image
GpuMat imgGray1; // load this with your grayscale image
GpuMat keys1; // this holds the keys detected
GpuMat desc1; // this holds the descriptors for the detected keypoints
GpuMat mask1; // this holds any mask you may want to use, or can be replace by noArray() in the call below if no mask is needed
vector<KeyPoint> cpuKeys1; // holds keypoints downloaded from gpu
//ADD CODE TO LOAD imgGray1
orb->detectAndComputeAsync(imgGray1, mask1, keys1, desc1, false, m_stream);
stream.waitForCompletion();
orb->convert(keys1, cpuKeys1); // download keys to cpu if needed for anything...like displaying or whatever
// process 2nd image
GpuMat imgGray2; // load this with your grayscale image
GpuMat keys2; // this holds the keys detected
GpuMat desc2; // this holds the descriptors for the detected keypoints
GpuMat mask2; // this holds any mask you may want to use, or can be replace by noArray() in the call below if no mask is needed
vector<KeyPoint> cpuKeys2; // holds keypoints downloaded from gpu
//ADD CODE TO LOAD imgGray2
orb->detectAndComputeAsync(imgGray2, mask2, keys2, desc2, false, m_stream);
stream.waitForCompletion();
orb->convert(keys2, cpuKeys2); // download keys to cpu if needed for anything...like displaying or whatever
if (desc2.rows > 0)
{
vector<vector<DMatch>> cpuKnnMatches;
GpuMat gpuKnnMatches; // holds matches on gpu
matcher->knnMatchAsync(desc2, desc1, gpuKnnMatches, 2, noArray(), *stream); // find matches
stream.waitForCompletion();
matcher->knnMatchConvert(gpuKnnMatches, cpuKnnMatches); // download matches from gpu and put into vector<vector<DMatch>> form on cpu
vector<DMatch> matches; // vector of good matches between tested images
for (std::vector<std::vector<cv::DMatch> >::const_iterator it = cpuKnnMatches.begin(); it != cpuKnnMatches.end(); ++it) {
if (it->size() > 1 && (*it)[0].distance / (*it)[1].distance < m_nnr) { // use Nearest-Neighbor Ratio to determine "good" matches
DMatch m = (*it)[0];
matches.push_back(m); // save good matches here
}
}
}
}