使用 solvePnP() 和 SOLVEPNP_IPPE_SQUARE 方法估计相机姿态
camera pose estimation with solvePnP() and SOLVEPNP_IPPE_SQUARE method
我正在使用 ARKit 并尝试从已知大小 (0.16m) 的 QR 码获取相机位置。
为了检测 QR 码,我正在使用 Vision 框架,这样我就可以获取图像上的每个角点。
资料准备:
let intrinsics = arFrame.camera.intrinsics
let imageResolution = arFrame.camera.imageResolution
let imagePointsArray = [NSValue(cgPoint: visionResult.topLeft), NSValue(cgPoint: visionResult.topRight), NSValue(cgPoint: visionResult.bottomLeft), NSValue(cgPoint: visionResult.bottomRight)]
let intrinsicsArray = (0..<3).flatMap { x in (0..<3).map { y in NSNumber(value: intrinsics[x][y]) } }
let squareLength = NSNumber(value: 0.16)
let res = OpenCVWrapper.findPose(imagePointsArray, intrinsics: intrinsicsArray, size: imageResolution, squareLength: squareLength)
为了获取相机位置,我正在使用 OpenCV 解决方案 solvePnP() with flag = SOLVEPNP_IPPE_SQUARE
Objective-C++ 中的 OpenCV 基于 :
+(Pose)findPose: (NSArray<NSValue *> *) imagePoints
intrinsics: (NSArray<NSNumber *> *) intrinsics
imageResolution: (CGSize) imageResolution
squareLength: (NSNumber *) squareLength {
cv::Mat distCoeffs(4,1,cv::DataType<double>::type, 0.0);
cv::Mat rvec(3,1,cv::DataType<double>::type);
cv::Mat tvec(3,1,cv::DataType<double>::type);
cv::Mat cameraMatrix = [self intrinsicMatrixWithArray:intrinsics];
vector<Point2f> cvImagePoints = [self convertImagePoints:imagePoints toSize: imageResolution];
vector<Point3f> cvObjectPoints = [self getObjectPointsWithSquareLength:squareLength];
std::cout << "object points: \n" << cvObjectPoints << std::endl;
std::cout << "image points: \n" << cvImagePoints << std::endl;
std::cout << "cameraMatrix points: \n" << cameraMatrix << std::endl;
cv::solvePnP(cvObjectPoints, cvImagePoints, cameraMatrix, distCoeffs, rvec, tvec, false, SOLVEPNP_IPPE_SQUARE);
std::cout << "rvec: \n" << rvec << std::endl;
std::cout << "tvec: \n" << tvec << std::endl;
cv::Mat RotX(3, 3, cv::DataType<double>::type);
cv::setIdentity(RotX);
RotX.at<double>(4) = -1; //cos(180) = -1
RotX.at<double>(8) = -1;
cv::Mat R;
cv::Rodrigues(rvec, R);
R = R.t(); // rotation of inverse
Mat rvecConverted;
Rodrigues(R, rvecConverted); //
std::cout << "rvec in world coords:\n" << rvecConverted << std::endl;
rvecConverted = RotX * rvecConverted;
std::cout << "rvec scenekit :\n" << rvecConverted << std::endl;
Mat tvecConverted = -R * tvec;
std::cout << "tvec in world coords:\n" << tvecConverted << std::endl;
tvecConverted = RotX * tvecConverted;
std::cout << "tvec scenekit :\n" << tvecConverted << std::endl;
SCNVector4 rotationVector = SCNVector4Make(rvecConverted.at<double>(0), rvecConverted.at<double>(1), rvecConverted.at<double>(2), norm(rvecConverted));
SCNVector3 translationVector = SCNVector3Make(tvecConverted.at<double>(0), tvecConverted.at<double>(1), tvecConverted.at<double>(2));
return Pose{rotationVector, translationVector};
}
+ (vector<Point3f>) getObjectPointsWithSquareLength: (NSNumber*) squareLength {
vector<Point3f> points;
double squareLengthDouble = [squareLength doubleValue];
points.push_back(Point3f(-squareLengthDouble/2, squareLengthDouble/2, 0));
points.push_back(Point3f(squareLengthDouble/2, squareLengthDouble/2, 0));
points.push_back(Point3f(squareLengthDouble/2, -squareLengthDouble/2, 0));
points.push_back(Point3f(-squareLengthDouble/2, -squareLengthDouble/2, 0));
return points;
}
+ (vector<Point2f>) convertImagePoints: (NSArray<NSValue *> *) array
toSize: (CGSize) size {
vector<Point2f> points;
for (NSValue * value in array) {
CGPoint point = [value CGPointValue];
points.push_back(Point2f((point.x * size.width), (point.y * size.height)));
}
return points;
}
+ (cv::Mat) intrinsicMatrixWithArray: (NSArray<NSNumber *> *) intrinsics {
Mat result(3,3,cv::DataType<double>::type);
cv::setIdentity(result);
result.at<double>(0) = [intrinsics[0] doubleValue]; //fx
result.at<double>(4) = [intrinsics[4] doubleValue]; //fy
result.at<double>(2) = [intrinsics[6] doubleValue]; //cx
result.at<double>(5) = [intrinsics[7] doubleValue]; //cy
result.at<double>(8) = [intrinsics[8] doubleValue]; //1
return result;
}
问题是当我将相机直接指向距离为 2 米的二维码时,translationVector 的结果。z (tvec scenekit ) 应该等于 2 米,但取而代之的是一个随机的正数或负数。
输出:
Calculated distance to QR 2.0856588
object points:
[-0.079999998, 0.079999998, 0;
0.079999998, 0.079999998, 0;
0.079999998, -0.079999998, 0;
-0.079999998, -0.079999998, 0]
image points:
[795.98724, 717.27045;
684.5592, 715.80487;
793.31567, 826.06146;
684.40692, 824.39771]
cameraMatrix points:
[1454.490478515625, 0, 935.6685791015625;
0, 1454.490478515625, 717.999267578125;
0, 0, 1]
rvec:
[-0.9251278749049585;
1.185890362907954;
-0.9989977018022447]
tvec:
[0.04753833193572054;
-0.009999648596310796;
-0.3527916723601041]
rvec in world coords:
[0.9251278749049584;
-1.185890362907954;
0.9989977018022447]
rvec scenekit :
[0.9251278749049584;
1.185890362907954;
-0.9989977018022447]
tvec in world coords:
[-0.1159248829391864;
-0.3366933247327607;
0.004569098144615695]
tvec scenekit :
[-0.1159248829391864;
0.3366933247327607;
-0.004569098144615695]
感谢您的帮助
相机和标签之间的估计翻译不正确。 tz
是负数,这在物理上是不可能的。有关相机坐标系的详细信息,请参阅 here。
您必须确保每个 3D 对象点与相应的 2D 图像点匹配。
如果我绘制 2D 坐标,我会得到以下图像:
用RGBM点的顺序。
如果你交换最后两个图像点,你应该得到:
rvec: [0.1217246105180353, 0.1224686744740433, -3.116495036698598]
tvec: [-0.2866576939480562, 0.07760414675470864, 2.127895748451679]
我正在使用 ARKit 并尝试从已知大小 (0.16m) 的 QR 码获取相机位置。 为了检测 QR 码,我正在使用 Vision 框架,这样我就可以获取图像上的每个角点。
资料准备:
let intrinsics = arFrame.camera.intrinsics
let imageResolution = arFrame.camera.imageResolution
let imagePointsArray = [NSValue(cgPoint: visionResult.topLeft), NSValue(cgPoint: visionResult.topRight), NSValue(cgPoint: visionResult.bottomLeft), NSValue(cgPoint: visionResult.bottomRight)]
let intrinsicsArray = (0..<3).flatMap { x in (0..<3).map { y in NSNumber(value: intrinsics[x][y]) } }
let squareLength = NSNumber(value: 0.16)
let res = OpenCVWrapper.findPose(imagePointsArray, intrinsics: intrinsicsArray, size: imageResolution, squareLength: squareLength)
为了获取相机位置,我正在使用 OpenCV 解决方案 solvePnP() with flag = SOLVEPNP_IPPE_SQUARE
Objective-C++ 中的 OpenCV 基于
+(Pose)findPose: (NSArray<NSValue *> *) imagePoints
intrinsics: (NSArray<NSNumber *> *) intrinsics
imageResolution: (CGSize) imageResolution
squareLength: (NSNumber *) squareLength {
cv::Mat distCoeffs(4,1,cv::DataType<double>::type, 0.0);
cv::Mat rvec(3,1,cv::DataType<double>::type);
cv::Mat tvec(3,1,cv::DataType<double>::type);
cv::Mat cameraMatrix = [self intrinsicMatrixWithArray:intrinsics];
vector<Point2f> cvImagePoints = [self convertImagePoints:imagePoints toSize: imageResolution];
vector<Point3f> cvObjectPoints = [self getObjectPointsWithSquareLength:squareLength];
std::cout << "object points: \n" << cvObjectPoints << std::endl;
std::cout << "image points: \n" << cvImagePoints << std::endl;
std::cout << "cameraMatrix points: \n" << cameraMatrix << std::endl;
cv::solvePnP(cvObjectPoints, cvImagePoints, cameraMatrix, distCoeffs, rvec, tvec, false, SOLVEPNP_IPPE_SQUARE);
std::cout << "rvec: \n" << rvec << std::endl;
std::cout << "tvec: \n" << tvec << std::endl;
cv::Mat RotX(3, 3, cv::DataType<double>::type);
cv::setIdentity(RotX);
RotX.at<double>(4) = -1; //cos(180) = -1
RotX.at<double>(8) = -1;
cv::Mat R;
cv::Rodrigues(rvec, R);
R = R.t(); // rotation of inverse
Mat rvecConverted;
Rodrigues(R, rvecConverted); //
std::cout << "rvec in world coords:\n" << rvecConverted << std::endl;
rvecConverted = RotX * rvecConverted;
std::cout << "rvec scenekit :\n" << rvecConverted << std::endl;
Mat tvecConverted = -R * tvec;
std::cout << "tvec in world coords:\n" << tvecConverted << std::endl;
tvecConverted = RotX * tvecConverted;
std::cout << "tvec scenekit :\n" << tvecConverted << std::endl;
SCNVector4 rotationVector = SCNVector4Make(rvecConverted.at<double>(0), rvecConverted.at<double>(1), rvecConverted.at<double>(2), norm(rvecConverted));
SCNVector3 translationVector = SCNVector3Make(tvecConverted.at<double>(0), tvecConverted.at<double>(1), tvecConverted.at<double>(2));
return Pose{rotationVector, translationVector};
}
+ (vector<Point3f>) getObjectPointsWithSquareLength: (NSNumber*) squareLength {
vector<Point3f> points;
double squareLengthDouble = [squareLength doubleValue];
points.push_back(Point3f(-squareLengthDouble/2, squareLengthDouble/2, 0));
points.push_back(Point3f(squareLengthDouble/2, squareLengthDouble/2, 0));
points.push_back(Point3f(squareLengthDouble/2, -squareLengthDouble/2, 0));
points.push_back(Point3f(-squareLengthDouble/2, -squareLengthDouble/2, 0));
return points;
}
+ (vector<Point2f>) convertImagePoints: (NSArray<NSValue *> *) array
toSize: (CGSize) size {
vector<Point2f> points;
for (NSValue * value in array) {
CGPoint point = [value CGPointValue];
points.push_back(Point2f((point.x * size.width), (point.y * size.height)));
}
return points;
}
+ (cv::Mat) intrinsicMatrixWithArray: (NSArray<NSNumber *> *) intrinsics {
Mat result(3,3,cv::DataType<double>::type);
cv::setIdentity(result);
result.at<double>(0) = [intrinsics[0] doubleValue]; //fx
result.at<double>(4) = [intrinsics[4] doubleValue]; //fy
result.at<double>(2) = [intrinsics[6] doubleValue]; //cx
result.at<double>(5) = [intrinsics[7] doubleValue]; //cy
result.at<double>(8) = [intrinsics[8] doubleValue]; //1
return result;
}
问题是当我将相机直接指向距离为 2 米的二维码时,translationVector 的结果。z (tvec scenekit ) 应该等于 2 米,但取而代之的是一个随机的正数或负数。
输出:
Calculated distance to QR 2.0856588
object points:
[-0.079999998, 0.079999998, 0;
0.079999998, 0.079999998, 0;
0.079999998, -0.079999998, 0;
-0.079999998, -0.079999998, 0]
image points:
[795.98724, 717.27045;
684.5592, 715.80487;
793.31567, 826.06146;
684.40692, 824.39771]
cameraMatrix points:
[1454.490478515625, 0, 935.6685791015625;
0, 1454.490478515625, 717.999267578125;
0, 0, 1]
rvec:
[-0.9251278749049585;
1.185890362907954;
-0.9989977018022447]
tvec:
[0.04753833193572054;
-0.009999648596310796;
-0.3527916723601041]
rvec in world coords:
[0.9251278749049584;
-1.185890362907954;
0.9989977018022447]
rvec scenekit :
[0.9251278749049584;
1.185890362907954;
-0.9989977018022447]
tvec in world coords:
[-0.1159248829391864;
-0.3366933247327607;
0.004569098144615695]
tvec scenekit :
[-0.1159248829391864;
0.3366933247327607;
-0.004569098144615695]
感谢您的帮助
相机和标签之间的估计翻译不正确。 tz
是负数,这在物理上是不可能的。有关相机坐标系的详细信息,请参阅 here。
您必须确保每个 3D 对象点与相应的 2D 图像点匹配。
如果我绘制 2D 坐标,我会得到以下图像:
用RGBM点的顺序。
如果你交换最后两个图像点,你应该得到:
rvec: [0.1217246105180353, 0.1224686744740433, -3.116495036698598]
tvec: [-0.2866576939480562, 0.07760414675470864, 2.127895748451679]