从局部 3D 点和全局 2D 点重建全局 3D 点;解决即插即用

Recreating Global 3D Points, from Local 3D Points and Global 2D Points; SolvePnP

你好, 我有一个位于全局 scope/frame(图像点)中的 2D 关键点列表,以及局部范围内相应的 3D 关键点列表(通常称为纹理或对象点)。像点的范围是x[0-1920]y[0,1080],物点的范围是x[-1,1]y[-1,1]。我按照 this paper on page 6 with the tutorial from here 中描述的方法进行了操作,但是我的 3D 点的输出根本不正确,点的移动到处都是。下面是我使用 SolvePnP 的方法。我在这里走错路了吗,因为 SolvePnP 通常用于检测相机移动(欢迎其他建议!)还是我的方法有误?

import numpy as np
import cv2
array = np.array # convenience

frame1_2d = \
array([[1033.9708251953125 ,  344.23065185546875],
       [1077.796630859375  ,  617.1146240234375 ],
       [ 958.2716674804688 ,  609.1179809570312 ],
       [1074.8084716796875 ,  782.0444946289062 ],
       [ 975.2044067382812 ,  418.1991882324219 ],
       [1024.0103759765625 ,  931.980712890625  ],
       [1122.6185302734375 ,  605.1196899414062 ],
       [1096.721435546875  ,  418.1991882324219 ],
       [ 999.109375        ,  617.1146240234375 ],
       [ 962.255859375     ,  518.1566772460938 ],
       [1111.662109375     ,  517.1571044921875 ],
       [1014.0499877929688 ,  782.0444946289062 ],
       [1061.8599853515625 ,  930.9811401367188 ]])
frame1_3d = \
array([[-0.01265097688883543   , -0.4992150068283081    , -0.11455678939819336   ],
       [ 0.10584918409585953   , -0.0018199272453784943 ,  0.0023642126470804214 ],
       [-0.14271944761276245   ,  0.06332945823669434   ,  0.1438678503036499    ],
       [ 0.09254898130893707   ,  0.3176574409008026    , -0.17930322885513306   ],
       [-0.1155640035867691    , -0.4058316648006439    ,  0.00021289288997650146],
       [-0.03301446512341499   ,  0.6519031524658203    , -0.3515356183052063    ],
       [ 0.14540529251098633   ,  0.05645819008350372   ,  0.10776595026254654   ],
       [ 0.10836226493120193   , -0.4078497290611267    ,  0.000870194286108017  ],
       [-0.10584865510463715   ,  0.001818838994950056  , -0.0023612845689058304 ],
       [-0.1546039581298828    , -0.17418316006660461   ,  0.10266228020191193   ],
       [ 0.1590884029865265    , -0.17913128435611725   ,  0.09423552453517914   ],
       [-0.0736076831817627    ,  0.3179360628128052    , -0.17892584204673767   ],
       [ 0.05236409604549408   ,  0.6490492820739746    , -0.33908188343048096   ]])

frame2_2d = \
array([[1028.110107421875  ,  327.7352600097656 ],
       [1068.0904541015625 ,  606.7128295898438 ],
       [ 982.1328125       ,  229.74314880371094],
       [1071.0889892578125 ,  778.698974609375  ],
       [ 979.13427734375   ,  403.7291564941406 ],
       [1013.1174926757812 ,  933.6865234375    ],
       [1069.0899658203125 ,  243.7420196533203 ],
       [1080.08447265625   ,  403.7291564941406 ],
       [ 997.1254272460938 ,  616.7119750976562 ],
       [ 983.13232421875   ,  312.7364501953125 ],
       [1071.0889892578125 ,  317.7360534667969 ],
       [1005.1214599609375 ,  778.698974609375  ],
       [1061.0938720703125 ,  936.686279296875  ]])

frame2_3d = \
array([[-0.0004756036214530468, -0.5245562791824341   , -0.010652128607034683 ],
       [ 0.10553547739982605  , -0.00272204983048141  ,  0.0024587283842265606],
       [-0.1196068525314331   , -0.6828885078430176   , -0.14210689067840576  ],
       [ 0.0845363438129425   ,  0.38039350509643555  , -0.028144780546426773 ],
       [-0.11286421865224838  , -0.4302292466163635   ,  0.06919233500957489  ],
       [-0.030065223574638367 ,  0.754790186882019    ,  0.012936152517795563 ],
       [ 0.1010960042476654   , -0.6289429664611816   , -0.11814753711223602  ],
       [ 0.1058841198682785   , -0.4253752827644348   ,  0.08086629956960678  ],
       [-0.10553570091724396  ,  0.002716599963605404 , -0.0024500866420567036],
       [-0.127223938703537    , -0.5319695472717285   , -0.09722068160772324  ],
       [ 0.11508879065513611  , -0.49151480197906494  , -0.07002018392086029  ],
       [-0.06679684668779373  ,  0.38714516162872314  , -0.023669833317399025 ],
       [ 0.05081187188625336  ,  0.7544023990631104   , -0.011078894138336182 ]])

frame3_2d = \
array([[1027.91845703125   ,  338.2441711425781 ],
       [1067.8787841796875 ,  612.0115356445312 ],
       [ 803.141357421875  ,  500.10662841796875],
       [1070.8758544921875 ,  776.8713989257812 ],
       [ 968.9768676757812 ,  413.18048095703125],
       [1012.9332885742188 ,  925.7449340820312 ],
       [1248.699462890625  ,  491.1142578125    ],
       [1089.8570556640625 ,  412.18133544921875],
       [ 995.9501342773438 ,  611.0123901367188 ],
       [ 871.073974609375  ,  461.1397399902344 ],
       [1181.765869140625  ,  454.14569091796875],
       [1003.9421997070312 ,  775.8722534179688 ],
       [1061.884765625     ,  933.7380981445312 ]])

frame3_3d = \
array([[-0.003511453978717327  , -0.5015891194343567    , -0.10520103573799133   ],
       [ 0.10480749607086182   , -0.00019206921570003033, -0.0004397481679916382 ],
       [-0.47764456272125244   , -0.1816674768924713    ,  0.04093759506940842   ],
       [ 0.0936243087053299    ,  0.3628539443016052    , -0.09391097724437714   ],
       [-0.11445926129817963   , -0.41107428073883057   ,  0.01644478738307953   ],
       [-0.03567686676979065   ,  0.720417320728302     , -0.10493464022874832   ],
       [ 0.4529808759689331    , -0.18383921682834625   , -0.02210136130452156   ],
       [ 0.1092790886759758    , -0.41095152497291565   ,  0.011709243059158325  ],
       [-0.10480757057666779   ,  0.00018716813065111637,  0.0004445519298315048 ],
       [-0.3031604290008545    , -0.2810187041759491    ,  0.07747684419155121   ],
       [ 0.3006024956703186    , -0.28319910168647766   ,  0.043038371950387955  ],
       [-0.07087739557027817   ,  0.35837966203689575   , -0.08430898934602737   ],
       [ 0.062416717410087585  ,  0.7248380780220032    , -0.13536334037780762   ]])

#frame1_2d = np.asarray(frame1_2d, dtype=float)
#frame1_3d = np.asarray(frame1_3d, dtype=float)
#frame2_2d = np.asarray(frame2_2d, dtype=float)
#frame2_3d = np.asarray(frame2_3d, dtype=float)
#frame3_2d = np.asarray(frame3_2d, dtype=float)
#frame3_3d = np.asarray(frame3_3d, dtype=float)

# Globalize 3D Points
dist_coeffs = (0.11480806073904032, -0.21946985653851792, 0.0012002116999769957, 0.008564577708855225, 0.11274677130853494)
camera_matrix = np.asarray([
    [1394.6027293299926, 0.0, 995.588675691456],
    [0.0, 1394.6027293299926, 599.3212928484164],
    [0.0, 0.0, 1]
])


# create rotation matrix of points
(success, rotation_vector, translation_vector) = cv2.solvePnP(frame3_3d, frame3_2d, camera_matrix, dist_coeffs, flags=0)
r_matrix = cv2.Rodrigues(rotation_vector)
rotation_matrix = np.zeros((4, 4))
rotation_matrix[:3, :3], _ = cv2.Rodrigues(rotation_vector)
rotation_matrix[:3, 3] = np.transpose(translation_vector)
rotation_matrix[3, 3] = 1

# apply rotation matrix to points
globalized_3d = np.c_[frame1_3d, np.ones((13, 1))]
for j in range(13):
    globalized_3d[j, :] = np.dot(rotation_matrix, globalized_3d[j, :])
print(globalized_3d)

提前致谢,感谢任何帮助!

编辑:在我的代码中包含了一些示例,在改进了最佳答案所建议的内容之后

Edit2:使用 flag=1 显着提高了性能/减少了很多抖动!

  1. 是的,solvePnP可以用
  2. 是的,你的数学错了

我假设您是从面部标志检测器中获取点的,因此它们具有固定的顺序。我还假设您的 3D 模型点以相同的顺序给出,并且它们的值是一致的并且与您看到的脸有点相似。您应该排除表示肉和下颌骨(与颅骨相对)的点。你实际上想要追踪头骨,而不是到处移动的嘴唇和下巴的位置。

rvec 是一个 axis-angle 编码。它的长度是旋转量(预计在0到3.14=pi之间),它的方向是旋转轴。

使用cv.Rodriguesrvec转为3x3旋转矩阵。

实际上,只需自己构建一些函数,这些函数采用 rvec 和 tvec 并构建一个 4x4 矩阵。将所有点扩展为 (x,y,z,1) 很麻烦,但只有一次。

确保使用@进行矩阵乘法(或np.dotnp.matmul, ...) 因为 *element-wise 乘法。