使用 solvePnP() 和 SOLVEPNP_IPPE_SQUARE 方法估计相机姿态

camera pose estimation with solvePnP() and SOLVEPNP_IPPE_SQUARE method

我正在使用 ARKit 并尝试从已知大小 (0.16m) 的 QR 码获取相机位置。 为了检测 QR 码,我正在使用 Vision 框架,这样我就可以获取图像上的每个角点。

资料准备:

let intrinsics = arFrame.camera.intrinsics
let imageResolution = arFrame.camera.imageResolution

let imagePointsArray = [NSValue(cgPoint: visionResult.topLeft), NSValue(cgPoint: visionResult.topRight), NSValue(cgPoint: visionResult.bottomLeft), NSValue(cgPoint: visionResult.bottomRight)]
let intrinsicsArray = (0..<3).flatMap { x in (0..<3).map { y in NSNumber(value: intrinsics[x][y]) } }
let squareLength = NSNumber(value: 0.16)
let res = OpenCVWrapper.findPose(imagePointsArray, intrinsics: intrinsicsArray, size: imageResolution, squareLength: squareLength)

为了获取相机位置,我正在使用 OpenCV 解决方案 solvePnP() with flag = SOLVEPNP_IPPE_SQUARE

Objective-C++ 中的 OpenCV 基于 :

+(Pose)findPose: (NSArray<NSValue *> *) imagePoints
     intrinsics: (NSArray<NSNumber *> *) intrinsics
imageResolution: (CGSize) imageResolution
   squareLength: (NSNumber *) squareLength {

    cv::Mat distCoeffs(4,1,cv::DataType<double>::type, 0.0);
    cv::Mat rvec(3,1,cv::DataType<double>::type);
    cv::Mat tvec(3,1,cv::DataType<double>::type);

    cv::Mat cameraMatrix = [self intrinsicMatrixWithArray:intrinsics];
    vector<Point2f> cvImagePoints = [self convertImagePoints:imagePoints toSize: imageResolution];
    vector<Point3f> cvObjectPoints = [self getObjectPointsWithSquareLength:squareLength];

    std::cout << "object points: \n" << cvObjectPoints << std::endl;
    std::cout << "image points: \n" << cvImagePoints << std::endl;
    std::cout << "cameraMatrix points: \n" << cameraMatrix << std::endl;

    cv::solvePnP(cvObjectPoints, cvImagePoints, cameraMatrix, distCoeffs, rvec, tvec, false, SOLVEPNP_IPPE_SQUARE);

    std::cout << "rvec: \n" << rvec << std::endl;
    std::cout << "tvec: \n" << tvec << std::endl;

    cv::Mat RotX(3, 3, cv::DataType<double>::type);
    cv::setIdentity(RotX);
    RotX.at<double>(4) = -1; //cos(180) = -1
    RotX.at<double>(8) = -1;

    cv::Mat R;
    cv::Rodrigues(rvec, R);

    R = R.t();  // rotation of inverse
    Mat rvecConverted;
    Rodrigues(R, rvecConverted); //
    std::cout << "rvec in world coords:\n" << rvecConverted << std::endl;
    rvecConverted = RotX * rvecConverted;
    std::cout << "rvec scenekit :\n" << rvecConverted << std::endl;

    Mat tvecConverted = -R * tvec;
    std::cout << "tvec in world coords:\n" << tvecConverted << std::endl;
    tvecConverted = RotX * tvecConverted;
    std::cout << "tvec scenekit :\n" << tvecConverted << std::endl;

    SCNVector4 rotationVector = SCNVector4Make(rvecConverted.at<double>(0), rvecConverted.at<double>(1), rvecConverted.at<double>(2), norm(rvecConverted));
    SCNVector3 translationVector = SCNVector3Make(tvecConverted.at<double>(0), tvecConverted.at<double>(1), tvecConverted.at<double>(2));

    return Pose{rotationVector, translationVector};
}

+ (vector<Point3f>) getObjectPointsWithSquareLength: (NSNumber*) squareLength {
    vector<Point3f> points;
    double squareLengthDouble = [squareLength doubleValue];
    points.push_back(Point3f(-squareLengthDouble/2, squareLengthDouble/2, 0));
    points.push_back(Point3f(squareLengthDouble/2, squareLengthDouble/2, 0));
    points.push_back(Point3f(squareLengthDouble/2, -squareLengthDouble/2, 0));
    points.push_back(Point3f(-squareLengthDouble/2, -squareLengthDouble/2, 0));
    return points;
}

+ (vector<Point2f>) convertImagePoints: (NSArray<NSValue *> *) array
                                toSize: (CGSize) size {
    vector<Point2f> points;
    for (NSValue * value in array) {
        CGPoint point = [value CGPointValue];
        points.push_back(Point2f((point.x * size.width), (point.y * size.height)));
    }
    return points;
}

+ (cv::Mat) intrinsicMatrixWithArray: (NSArray<NSNumber *> *) intrinsics {
    Mat result(3,3,cv::DataType<double>::type);
    cv::setIdentity(result);
    result.at<double>(0) = [intrinsics[0] doubleValue]; //fx
    result.at<double>(4) = [intrinsics[4] doubleValue]; //fy
    result.at<double>(2) = [intrinsics[6] doubleValue]; //cx
    result.at<double>(5) = [intrinsics[7] doubleValue]; //cy
    result.at<double>(8) = [intrinsics[8] doubleValue]; //1
    return result;
}

问题是当我将相机直接指向距离为 2 米的二维码时,translationVector 的结果。z (tvec scenekit ) 应该等于 2 米,但取而代之的是一个随机的正数或负数。

输出:

Calculated distance to QR 2.0856588
object points: 
[-0.079999998, 0.079999998, 0;
 0.079999998, 0.079999998, 0;
 0.079999998, -0.079999998, 0;
 -0.079999998, -0.079999998, 0]
image points: 
[795.98724, 717.27045;
 684.5592, 715.80487;
 793.31567, 826.06146;
 684.40692, 824.39771]
cameraMatrix points: 
[1454.490478515625, 0, 935.6685791015625;
 0, 1454.490478515625, 717.999267578125;
 0, 0, 1]
rvec: 
[-0.9251278749049585;
 1.185890362907954;
 -0.9989977018022447]
tvec: 
[0.04753833193572054;
 -0.009999648596310796;
 -0.3527916723601041]
rvec in world coords:
[0.9251278749049584;
 -1.185890362907954;
 0.9989977018022447]
rvec scenekit :
[0.9251278749049584;
 1.185890362907954;
 -0.9989977018022447]
tvec in world coords:
[-0.1159248829391864;
 -0.3366933247327607;
 0.004569098144615695]
tvec scenekit :
[-0.1159248829391864;
 0.3366933247327607;
 -0.004569098144615695]

感谢您的帮助

相机和标签之间的估计翻译不正确。 tz 是负数,这在物理上是不可能的。有关相机坐标系的详细信息,请参阅 here

您必须确保每个 3D 对象点与相应的 2D 图像点匹配。

如果我绘制 2D 坐标,我会得到以下图像:

用RGBM点的顺序。

如果你交换最后两个图像点,你应该得到:

rvec: [0.1217246105180353, 0.1224686744740433, -3.116495036698598]
tvec: [-0.2866576939480562, 0.07760414675470864, 2.127895748451679]