ArUco Markers, pose estimation- 究竟是哪个点给出了平移和旋转？

Question

我检测到 ArUco 标记并估计了姿势。请参见下图。但是，我得到的Xt（X翻译）是一个正值。根据drawAxis函数，正方向远离图像中心。所以我认为它应该是负值。为什么我反而变得积极了。

我的相机距离成像面约 120 毫米。但我得到的 Zt（Z 平移）在 650 毫米范围内。姿态估计是否给出了标记相对于物理相机或图像平面中心的姿态？我不明白为什么 Zt 这么高。

我边改变边测量PoseZ，得到roll,pitch,yaw。我注意到滚动（旋转 w.r.t.cam X 轴）正在来回改变其符号幅度变化 166-178，但 Xt 的符号没有随着滚动中的符号变化而改变。关于它为什么会这样的任何想法？对获得更一致的数据有什么建议吗？

image=cv.imread(fname)
arucoDict = cv.aruco.Dictionary_get(cv.aruco.DICT_4X4_1000)
arucoParams = cv.aruco.DetectorParameters_create()
(corners, ids, rejected) = cv.aruco.detectMarkers(image, arucoDict,
    parameters=arucoParams)

    print(corners, ids, rejected)
    
    if len(corners) > 0:
        # flatten the ArUco IDs list
        ids = ids.flatten()
        # loop over the detected ArUCo corners
        #for (markerCorner, markerID) in zip(corners, ids):
        #(markerCorner, markerID)=(corners, ids)
            # extract the marker corners (which are always returned in
            # top-left, top-right, bottom-right, and bottom-left order)
        #corners = corners.reshape((4, 2))
        (topLeft, topRight, bottomRight, bottomLeft) = corners[0][0][0],corners[0][0][1],corners[0][0][2],corners[0][0][3]
            # convert each of the (x, y)-coordinate pairs to integers
        topRight = (int(topRight[0]), int(topRight[1]))
        bottomRight = (int(bottomRight[0]), int(bottomRight[1]))
        bottomLeft = (int(bottomLeft[0]), int(bottomLeft[1]))
        topLeft = (int(topLeft[0]), int(topLeft[1]))
    
            # draw the bounding box of the ArUCo detection
        cv.line(image, topLeft, topRight, (0, 255, 0), 2)
        cv.line(image, topRight, bottomRight, (0, 255, 0), 2)
        cv.line(image, bottomRight, bottomLeft, (0, 255, 0), 2)
        cv.line(image, bottomLeft, topLeft, (0, 255, 0), 2)
            # compute and draw the center (x, y)-coordinates of the ArUco
            # marker
        cX = int((topLeft[0] + bottomRight[0]) / 2.0)
        cY = int((topLeft[1] + bottomRight[1]) / 2.0)
        cv.circle(image, (cX, cY), 4, (0, 0, 255), -1)
        
        if topLeft[1]!=topRight[1] or topLeft[0]!=bottomLeft[0]:
            rot1=np.degrees(np.arctan((topLeft[0]-bottomLeft[0])/(bottomLeft[1]-topLeft[1])))
            rot2=np.degrees(np.arctan((topRight[1]-topLeft[1])/(topRight[0]-topLeft[0])))
            rot=(np.round(rot1,3)+np.round(rot2,3))/2
            print(rot1,rot2,rot)
        else:
            rot=0
    
        # draw the ArUco marker ID on the image
        rotS=",rotation:"+str(np.round(rot,3))
        cv.putText(image, ("position: "+str(cX) +","+str(cY)),
        (100, topLeft[1] - 15), cv.FONT_HERSHEY_SIMPLEX,0.5, (255, 0, 80), 2)
        cv.putText(image, rotS,
        (400, topLeft[1] -15), cv.FONT_HERSHEY_SIMPLEX,0.5, (255, 0, 80), 2)
        print("[INFO] ArUco marker ID: {}".format(ids))
        
        
        d=np.round((math.dist(topLeft,bottomRight)+math.dist(topRight,bottomLeft))/2,3)
        # Get the rotation and translation vectors
        rvecs, tvecs, obj_points = cv.aruco.estimatePoseSingleMarkers(corners,aruco_marker_side_length,mtx,dst)
        
         
        # Print the pose for the ArUco marker
        # The pose of the marker is with respect to the camera lens frame.
        # Imagine you are looking through the camera viewfinder, 
        # the camera lens frame's:
        # x-axis points to the right
        # y-axis points straight down towards your toes
        # z-axis points straight ahead away from your eye, out of the camera
        #for i, marker_id in enumerate(marker_ids):
         
            # Store the translation (i.e. position) information
        transform_translation_x = tvecs[0][0][0]
        transform_translation_y = tvecs[0][0][1]
        transform_translation_z = tvecs[0][0][2]
               
            # Store the rotation information
        rotation_matrix = np.eye(4)
        rotation_matrix[0:3, 0:3] = cv.Rodrigues(np.array(rvecs[0]))[0]
        r = R.from_matrix(rotation_matrix[0:3, 0:3])
        quat = r.as_quat()   
             
            # Quaternion format     
        transform_rotation_x = quat[0] 
        transform_rotation_y = quat[1] 
        transform_rotation_z = quat[2] 
        transform_rotation_w = quat[3] 
             
            # Euler angle format in radians
        roll_x, pitch_y, yaw_z = euler_from_quaternion(transform_rotation_x,transform_rotation_y,transform_rotation_z,transform_rotation_w)
             
        roll_x = math.degrees(roll_x)
        pitch_y = math.degrees(pitch_y)
        yaw_z = math.degrees(yaw_z)

Answer 1

不检查所有代码（看起来大致没问题），关于 OpenCV 和 aruco 的一些基础知识：

两者都使用 right-handed 坐标系。拇指 X，食指 Y，中间 Z。

OpenCV 使用 X 向右，Y 向下，Z 远，screen/camera 帧。屏幕和图片的原点是左上角。对于相机，原点是针孔模型的中心，也就是孔径的中心。我不能评论镜头或镜头系统。假设镜头中心是原点。这可能足够接近了。

如果标记平放在 table 上，Aruco 使用 X 向右，Y 远，Z 向上。原点位于标记的中心。标记的左上角被认为是“第一个”角。

marker可以认为是有自己的坐标system/frame.

rvec和tvec给定的pose就是marker在camera frame中的pose。这意味着 np.linalg.norm(tvec) 为您提供从相机到标记中心的直接距离。 tvec的Z只是平行于光轴的分量。

如果标记位于图片的右半部分（“一半”由相机矩阵的 cx,cy 定义），您会期望 tvec 的 X 增长。下半部分，Y positive/growing.

相反，该转换将 marker-local 坐标转换为 camera-local。尝试变换一些 marker-local 点，例如坐标轴上的原点或点。我相信 cv::transform 可以帮助解决这个问题。使用 OpenCV 的 projectPoints 将 3D space 点映射到 2D 图像点，然后您可以绘制标记的轴，或在其上绘制立方体，或任何您喜欢的东西。

假设标记笔直并面向相机 dead-on。当你考虑space（“世界”space）中标记和相机的三元组时，两者都是X“正确”，但一个的Y和Z与另一个的Y和Z相反，所以您希望看到围绕 X 轴旋转半圈（旋转 Z 和 Y）。

你可以想象这样的转变：

最初相机透过标记，从标记的背面看向世界。相机将“倒置”。相机看到marker-space.
姿势的旋转组件围绕相机原点旋转整个 marker-local 世界。从世界坐标系（参考点）看，相机旋转，变成您认为自然的姿态。
姿势的平移将标记的世界移出在相机前面（Z 为正），或者等效地，相机从标记后退。

如果您得到难以置信的值，请检查 aruco_marker_side_length 和相机矩阵。对于典型分辨率 (VGA-4k) 和视野（60-80 度），f 约为 500-3000。

ArUco Markers, pose estimation- 究竟是哪个点给出了平移和旋转？

ArUco Markers, pose estimatimation- Exactly for which point the traslation and rotation is given?

opencv

image-processing

computer-vision

aruco

pose