使用相机固有矩阵在 2d 图像上投影 3D 网格

Project 3D mesh on 2d image using camera intrinsic matrix

我一直在尝试使用 HOnnotate dataset to extract perspective correct hand and object masks as shown in the images of Task-3 of the Hands-2019 challenge

数据集附带以下注释:

annotations:
    The annotations are provided in pickled files under meta folder for each sequence. The pickle files in the training data contain a dictionary with the following keys:
    objTrans: A 3x1 vector representing object translation
    objRot: A 3x1 vector representing object rotation in axis-angle representation
    handPose: A 48x1 vector represeting the 3D rotation of the 16 hand joints including the root joint in axis-angle representation. The ordering of the joints follow the MANO model convention (see joint_order.png) and can be directly fed to MANO model.
    handTrans: A 3x1 vector representing the hand translation
    handBeta: A 10x1 vector representing the MANO hand shape parameters
    handJoints3D: A 21x3 matrix representing the 21 3D hand joint locations
    objCorners3D: A 8x3 matrix representing the 3D bounding box corners of the object
    objCorners3DRest: A 8x3 matrix representing the 3D bounding box corners of the object before applying the transormation
    objName: Name of the object as given in YCB dataset
    objLabel: Object label as given in YCB dataset
    camMat: Intrinsic camera parameters
    handVertContact: A 778D boolean vector whose each element represents whether the corresponding MANO vertex is in contact with the object. A MANO vertex is in contact if its distance to the object surface is <4mm
    handVertDist: A 778D float vector representing the distance of MANO vertices to the object surface.
    handVertIntersec: A 778D boolean vector specifying if the MANO vertices are inside the object surface.
    handVertObjSurfProj: A 778x3 matrix representing the projection of MANO vertices on the object surface.

它还附带一个可视化脚本 (https://github.com/shreyashampali/ho3d),可以将注释渲染为 3D 网格(使用 Open3D)或 object 角和手点上的 2D 项目(使用 Matplotlib):

我想做的是将 Open3D 创建的可视化投影回原始图像。

到目前为止,我还无法做到这一点。我能做的是从 3d 网格中获取点云并在其上应用相机固有功能以使其透视正确,现在的问题是如何从 point-cloud 为双手和 object就像Open3d渲染的那个。

# code looks as follows
# "mesh" is an Open3D triangle mesh ie "open3d.geometry.TriangleMesh()" 
pcd = open3d.geometry.PointCloud()
pcd.points = mesh.vertices
pcd.colors = mesh.vertex_colors
pcd.normals = mesh.vertex_normals

pts3D = np.asarray(pcd.points)
# hand/object along negative z-axis so need to correct perspective when plotting using OpenCV
cord_change_mat = np.array([[1., 0., 0.], [0, -1., 0.], [0., 0., -1.]], dtype=np.float32)
pts3D = pts3D.dot(cord_change_mat.T)

# "anno['camMat']" is camera intrinsic matrix 
img_points, _ = cv2.projectPoints(pts3D, (0, 0, 0), (0, 0, 0), anno['camMat'], np.zeros(4, dtype='float32'))

# draw perspective correct point cloud back on the image
for point in img_points:
    p1, p2 = int(point[0][0]), int(point[0][1])
    img[p2, p1] = (255, 255, 255)

基本上,我正在尝试找出这个分割掩码:

PS。抱歉,如果这没有多大意义,我对 3D 网格、点云及其投影非常陌生。我还不知道他们的所有正确的技术词汇。有问题可以评论留言,我会尽量解释清楚。

原来有一种简单的方法可以使用 Open3D 和相机内在值来完成这项任务。基本上我们指示 Open3D 从相机的 POV 渲染图像。


import open3d
import open3d.visualization.rendering as rendering

# Create a renderer with a set image width and height
render = rendering.OffscreenRenderer(img_width, img_height)

# setup camera intrinsic values
pinhole = open3d.camera.PinholeCameraIntrinsic(img_width, img_height, fx, fy, cx, cy)
    
# Pick a background colour of the rendered image, I set it as black (default is light gray)
render.scene.set_background([0.0, 0.0, 0.0, 1.0])  # RGBA

# now create your mesh
mesh = open3d.geometry.TriangleMesh()
mesh.paint_uniform_color([1.0, 0.0, 0.0]) # set Red color for mesh 
# define further mesh properties, shape, vertices etc  (omitted here)  

# Define a simple unlit Material.
# (The base color does not replace the mesh's own colors.)
mtl = o3d.visualization.rendering.Material()
mtl.base_color = [1.0, 1.0, 1.0, 1.0]  # RGBA
mtl.shader = "defaultUnlit"

# add mesh to the scene
render.scene.add_geometry("MyMeshModel", mesh, mtl)

# render the scene with respect to the camera
render.scene.camera.set_projection(camMat, 0.1, 1.0, 640, 480)
img_o3d = render.render_to_image()

# we can now save the rendered image right at this point 
open3d.io.write_image("output.png", img_o3d, 9)


# Optionally, we can convert the image to OpenCV format and play around.
# For my use case I mapped it onto the original image to check quality of 
# segmentations and to create masks.
# (Note: OpenCV expects the color in BGR format, so swap red and blue.)
img_cv2 = cv2.cvtColor(np.array(img_o3d), cv2.COLOR_RGBA2BGR)
cv2.imwrite("cv_output.png", img_cv2)

这个答案借鉴了很多 this answer