使用 Python 改进 AR 应用的两帧视频之间的单应性估计

Question

我应该使用 OpenCV 库在 Phyton 中改进 AR 应用程序，进行帧与帧的比较。我们必须在必须在现有视频上检测到的书籍封面上投影图像。

我们的想法是在两个连续帧之间采用单应性来保持更新第一帧和当前帧之间的单应性，以便投影 AR 层。我在单应性的正确估计中发现了问题。它似乎在单应性的每次更新时收集错误，可能是由于矩阵的乘法，每次帧比较都会重复一次。输出视频的结果是 AR 层的定位越来越差。

如何解决保持 frame2frame 方法的问题？

这里是代码的相关部分：


[...]

#################################

img_array = []
success = True
success,img_trainCOLOUR = vid.read()
kp_query=kp_ref
des_query=des_ref

#get shapes of images
h,w = img_ref.shape[:2]
h_t, w_t = img_trainCOLOUR.shape[:2]
M_mask = np.identity(3, dtype='float64')
M_prev=M_ar

#performing iterations until last frame
while success :

    #obtain grayscale image of the current RGB frame
    img_train = cv2.cvtColor(img_trainCOLOUR, cv2.COLOR_BGR2GRAY)

    # Implementing the object detection pipeline
    # F2F method: correspondences between the previous video frame and the actual frame 
    kp_train = sift.detect(img_train)
    kp_train, des_train = sift.compute(img_train, kp_train)
    
    #find matches 
    flann = cv2.FlannBasedMatcher(index_params, search_params)
    matches = flann.knnMatch(des_query,des_train,k=2)
    
    #validating matches
    good = []
    for m ,n in matches:
        if m.distance < 0.7*n.distance:
            good.append(m)

    #checking if we found the object
    MIN_MATCH_COUNT = 10
    if len(good)>MIN_MATCH_COUNT: 

        #differenciate between source points and destination points
        src_pts = np.float32([ kp_query[m.queryIdx].pt for m in good ]).reshape(-1,1,2)
        dst_pts = np.float32([ kp_train[m.trainIdx].pt for m in good ]).reshape(-1,1,2)
    
        #find homography between current and previous video frames
        M1, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)
        #matchesMask = mask.ravel().tolist()

        #updated homography between M_mask which contains  from first to current-1 frame homography
        # and the current frame2frame
        M_mask=np.dot(M_mask,M1)
        #updated homography between M_prev which contains img_ar layer and first frame homography,
        # and from first to current-1 frame homography and the current frame2frame
        M = np.dot(M1, M_prev)
        
        #warping the img_ar (transformed as the first frame)
        warped = cv2.warpPerspective(img_arCOLOUR, M, (w_t, h_t), flags= cv2.INTER_LINEAR)
        warp_mask = cv2.warpPerspective(img_armask, M, (w_t, h_t), flags= cv2.INTER_LINEAR)
        
        #restore previous values of the train images where the mask is black
        warp_mask = np.equal(warp_mask, 0)
        warped[warp_mask] = img_trainCOLOUR[warp_mask]
         
        #inserting the frames into the frame array in order to reconstruct video sequence
        img_array.append(warped)
                   
        #save current homography for the successive iteration
        M_prev = M
        #save the current frame for the successive iteration
        img_query=img_train

        #warping the mask of the book cover as the current frame
        img_maskTrans = cv2.warpPerspective(img_mask, M_mask, (w_t, h_t), flags= cv2.INTER_NEAREST)

        #new sift object detection with the current frame and the current mask 
        # to search only the book cover into the next frame      
        kp_query=sift.detect(img_query,img_maskTrans)
        kp_query, des_query = sift.compute(img_query, kp_query)

        #reading next frame for the successive iteration
        success,img_trainCOLOUR = vid.read()

[...]

这里有输入数据、完整代码和输出： https://drive.google.com/drive/folders/1EAI7wYVFy7SbNZs8Cet7fWEfK2usw-y1?usp=sharing

感谢支持

Answer 1

你的解决方案有偏差，因为你总是匹配上一张图像，而不是固定的参考图像。保持其中一张图像不变。此外，SIFT 或任何其他 descriptor-based 匹配方法对于短基线跟踪来说是矫枉过正的。您可以只检测兴趣点（Shi-Tomasi goodFeaturesToTrack 或 Harris 角点）并使用 Lucas-Kanade.

跟踪它们

使用 Python 改进 AR 应用的两帧视频之间的单应性估计

Improve homography estimation between two frames of a video for AR application with Python

python

opencv

object-detection

homography

augmented-reality