OpenCV——来自未校准立体系统的深度图

Question

我正在尝试使用未校准的方法获取深度图。我可以通过使用 SIFT 找到对应点然后使用 cv2.findFundamentalMat 来获得基本矩阵。然后我使用 cv2.stereoRectifyUncalibrated 来获取每个图像的单应矩阵。最后我使用 cv2.warpPerspective 来校正和计算视差，但这并没有创建一个好的深度图。这些值非常高，所以我想知道我是否必须使用 warpPerspective 或者我是否必须从 stereoRectifyUncalibrated.

得到的单应矩阵计算旋转矩阵

我不确定用stereoRectifyUncalibrated得到的单应矩阵的情况下的投影矩阵要矫正。

部分代码：

#Obtainment of the correspondent point with SIFT
sift = cv2.SIFT()

###find the keypoints and descriptors with SIFT
kp1, des1 = sift.detectAndCompute(dst1,None)
kp2, des2 = sift.detectAndCompute(dst2,None)

###FLANN parameters
FLANN_INDEX_KDTREE = 0
index_params = dict(algorithm = FLANN_INDEX_KDTREE, trees = 5)
search_params = dict(checks=50)

flann = cv2.FlannBasedMatcher(index_params,search_params)
matches = flann.knnMatch(des1,des2,k=2)

good = []
pts1 = []
pts2 = []

###ratio test as per Lowe's paper
for i,(m,n) in enumerate(matches):
    if m.distance < 0.8*n.distance:
        good.append(m)
        pts2.append(kp2[m.trainIdx].pt)
        pts1.append(kp1[m.queryIdx].pt)
    
    
pts1 = np.array(pts1)
pts2 = np.array(pts2)

#Computation of the fundamental matrix
F,mask= cv2.findFundamentalMat(pts1,pts2,cv2.FM_LMEDS)


# Obtainment of the rectification matrix and use of the warpPerspective to transform them...
pts1 = pts1[:,:][mask.ravel()==1]
pts2 = pts2[:,:][mask.ravel()==1]

pts1 = np.int32(pts1)
pts2 = np.int32(pts2)

p1fNew = pts1.reshape((pts1.shape[0] * 2, 1))
p2fNew = pts2.reshape((pts2.shape[0] * 2, 1))
    
retBool ,rectmat1, rectmat2 = cv2.stereoRectifyUncalibrated(p1fNew,p2fNew,F,(2048,2048))

dst11 = cv2.warpPerspective(dst1,rectmat1,(2048,2048))
dst22 = cv2.warpPerspective(dst2,rectmat2,(2048,2048))

#calculation of the disparity
stereo = cv2.StereoBM(cv2.STEREO_BM_BASIC_PRESET,ndisparities=16*10, SADWindowSize=9)
disp = stereo.compute(dst22.astype(uint8), dst11.astype(uint8)).astype(np.float32)
plt.imshow(disp);plt.colorbar();plt.clim(0,400)#;plt.show()
plt.savefig("0gauche.png")

#plot depth by using disparity focal length `C1[0,0]` from stereo calibration and `T[0]` the distance between cameras

plt.imshow(C1[0,0]*T[0]/(disp),cmap='hot');plt.clim(-0,500);plt.colorbar();plt.show()

这里是未校准方法（和warpPerspective）矫正后的图片：

这里是校正后的图片：

我不知道这两种图片之间的差异为何如此重要。对于校准方法，它似乎没有对齐。

使用未校准方法的视差图：

深度计算方式：C1[0,0]*T[0]/(disp) 与 stereoCalibrate 中的 T。颜值很高

------------ 稍后编辑------------

我试图用通过“ stereoRectifyUncalibrated”，但结果仍然不好。我这样做正确吗？

Y=np.arange(0,2048)
X=np.arange(0,2048)
(XX_field,YY_field)=np.meshgrid(X,Y)

#I mount the X, Y and disparity in a same 3D array 
stock = np.concatenate((np.expand_dims(XX_field,2),np.expand_dims(YY_field,2)),axis=2)
XY_disp = np.concatenate((stock,np.expand_dims(disp,2)),axis=2)

XY_disp_reshape = XY_disp.reshape(XY_disp.shape[0]*XY_disp.shape[1],3)

Ts = np.hstack((np.zeros((3,3)),T_0)) #i use only the translations obtained with the rectified calibration...Is it correct?


# I establish the projective matrix with the homography matrix
P11 = np.dot(rectmat1,C1)
P1 = np.vstack((np.hstack((P11,np.zeros((3,1)))),np.zeros((1,4))))
P1[3,3] = 1

# P1 = np.dot(C1,np.hstack((np.identity(3),np.zeros((3,1)))))

P22 = np.dot(np.dot(rectmat2,C2),Ts)
P2 = np.vstack((P22,np.zeros((1,4))))
P2[3,3] = 1

lambda_t = cv2.norm(P1[0,:].T)/cv2.norm(P2[0,:].T)


#I define the reconstruction matrix
Q = np.zeros((4,4))

Q[0,:] = P1[0,:].T
Q[1,:] = P1[1,:].T
Q[2,:] = lambda_t*P2[1,:].T - P1[1,:].T
Q[3,:] = P1[2,:].T

#I do the calculation to get my 3D coordinates
test = []
for i in range(0,XY_disp_reshape.shape[0]):
    a = np.dot(inv(Q),np.expand_dims(np.concatenate((XY_disp_reshape[i,:],np.ones((1))),axis=0),axis=1))
    test.append(a)

test = np.asarray(test)

XYZ = test[:,:,0].reshape(XY_disp.shape[0],XY_disp.shape[1],4)

Answer 1

可能有几个可能的问题导致 low-quality Depth Channel 和 Disparity Channel 导致我们 low-quality 立体声序列。以下是其中的 6 个问题：

可能的问题一

公式不完整

正如uncalibrated这个词所暗示的那样，stereoRectifyUncalibrated实例方法会为您计算一个整流变换，以防您不知道或无法知道您的立体对及其相关参数的内在参数在环境中的位置。

cv.StereoRectifyUncalibrated(pts1, pts2, fm, imgSize, rhm1, rhm2, thres)

其中：

# pts1    –> an array of feature points in a first camera
# pts2    –> an array of feature points in a first camera
# fm      –> input fundamental matrix
# imgSize -> size of an image
# rhm1    -> output rectification homography matrix for a first image
# rhm2    -> output rectification homography matrix for a second image
# thres   –> optional threshold used to filter out outliers

你的方法是这样的：

cv2.StereoRectifyUncalibrated(p1fNew, p2fNew, F, (2048, 2048))

因此，您没有考虑三个参数：rhm1、rhm2 和 thres。如果 threshold > 0，所有不符合对极几何的点对在计算单应性之前被拒绝。否则，所有点都被认为是内点。该公式如下所示：

(pts2[i]^t * fm * pts1[i]) > thres

# t   –> translation vector between coordinate systems of cameras

因此，我认为由于公式计算不完整，可能会出现视觉上的不准确。

您可以在官方资源上阅读Camera Calibration and 3D Reconstruction。

可能的问题二

轴距

左右相机镜头之间的稳健interaxial distance必须是not greater than 200 mm。当 interaxial distance 大于 interocular 距离时，这种效果称为 hyperstereoscopy 或 hyperdivergence，不仅会导致场景的深度夸张，还会导致观看者的身体不适。阅读 Autodesk 的 Stereoscopic Filmmaking Whitepaper 以了解有关此主题的更多信息。

可能的问题三

平行 vs Toed-In 相机模式

结果 Disparity Map 中的视觉不准确可能是由于相机模式计算不正确造成的。许多立体摄影师更喜欢 Toe-In camera mode 但皮克斯，例如，更喜欢 Parallel camera mode.

可能的问题四

垂直对齐

在立体视觉中，如果发生垂直偏移（即使其中一个视图向上偏移 1 毫米），它会破坏强大的立体体验。因此，在生成 Disparity Map 之前，您必须确保立体对的左右视图已相应对齐。查看Technicolor Sterreoscopic Whitepaper关于立体声的15个常见问题。

立体声整流矩阵：

   ┌                  ┐
   |  f   0   cx  tx  |
   |  0   f   cy  ty  |   # use "ty" value to fix vertical shift in one image
   |  0   0   1   0   |
   └                  ┘

这是一个 StereoRectify 方法：

cv.StereoRectify(cameraMatrix1, cameraMatrix2, distCoeffs1, distCoeffs2, imageSize, R, T, R1, R2, P1, P2, Q=None, flags=CV_CALIB_ZERO_DISPARITY, alpha=-1, newImageSize=(0, 0)) -> (roi1, roi2)

可能的问题 V

镜头畸变

镜头畸变是立体合成中非常重要的话题。在生成 Disparity Map 之前，您需要对左视图和右视图进行反扭曲，在此之后生成视差通道，然后再次重新扭曲两个视图。

可能的问题六

Low-quality 没有深度通道 anti-aliasing

要创建 high-quality Disparity Map，您需要左和右 Depth Channels，它们必须是 pre-generated。当您使用 3D 包工作时，只需单击一下即可渲染 high-quality 深度通道（具有清晰的边缘）。但是从视频序列生成 high-quality 深度通道并不容易，因为立体对必须在您的环境中移动，以便为将来的 depth-from-motion 算法生成初始数据。如果帧中没有运动，深度通道将非常差。

Also, Depth channel itself has one more drawback – its edges do not match the edges of the RGB because it has no anti-aliasing.

视差通道代码片段：

在这里我想介绍一种快速生成 Disparity Map:

的方法

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt

imageLeft = cv.imread('paris_left.png', 0)
imageRight = cv.imread('paris_right.png', 0)
stereo = cv.StereoBM_create(numDisparities=16, blockSize=15)
disparity = stereo.compute(imageLeft, imageRight)
plt.imshow(disparity, 'gray')
plt.show()

Answer 2

TLDR；对边缘更平滑的图像使用 StereoSGBM（半全局块匹配）并使用一些 post 过滤，如果你想让它更平滑

OP 没有提供原始图像，所以我使用 Middlebury data set 中的 Tsukuba。

常规 StereoBM 的结果

StereoSGBM 的结果（已调整）

我能在文献中找到的最好结果

有关详细信息，请参阅出版物 here。

post 过滤示例（参见下面的 link）

Theory/Other 来自 OP 问题的考虑

您校准后的校正图像的大片黑色区域让我相信对于那些图像，校准不是很好。可能有多种原因在起作用，也许是物理设置，也许是校准时的照明等等，但是有很多相机校准教程可以解决这个问题，我的理解是您正在寻求一种方法从未校准的设置中获得更好的深度图（这不是 100% 清楚，但标题似乎支持这一点，我认为这就是人们来到这里试图找到的东西）。

你的基本做法是对的，但效果肯定有待提高。这种形式的深度映射不在生成最高质量地图（尤其是未校准的）的范围之内。最大的改进可能来自使用不同的立体匹配算法。照明也可能产生重大影响。右边的图像（至少在我的肉眼看来）似乎光线不足，这可能会干扰重建。您可以先尝试将它调亮到与另一个相同的级别，或者如果可能的话收集新图像。从这里开始，我假设您无法使用原始相机，因此我会考虑收集新图像、更改设置或执行超出范围的校准。（如果您确实有权访问设置和相机，那么我建议检查校准并使用校准方法，因为这样效果会更好）。

您使用 StereoBM 计算确实有效的视差（深度图），但 StereoSGBM 更适合此应用程序（它可以更好地处理更平滑的边缘）。您可以在下面看到差异。

This article 更深入地解释差异：

Block matching focuses on high texture images (think a picture of a tree) and semi-global block matching will focus on sub pixel level matching and pictures with more smooth textures (think a picture of a hallway).

没有任何明确的内在相机参数、关于相机设置的细节（如焦距、相机之间的距离、与主体的距离等）、图像中的已知尺寸或运动（使用 structure from motion), you can only obtain 3D reconstruction up to a projective transform; you won't have a sense of scale or necessarily rotation either, but you can still generate a relative depth map. You will likely suffer from some barrel and other distortions which could be removed with proper camera calibration, but you can get reasonable results without it as long as the cameras aren’t terrible (lens system isn't too distorted) and are set up pretty close to canonical configuration（这基本上意味着它们的方向使得它们的光轴尽可能接近平行，并且它们的视野充分重叠）。然而，这似乎不是 OP 的问题，因为他确实成功了使用未校准方法校正图像。

基本程序

在两个图像中找到至少 5 个 well-matched 点可以用来计算基本矩阵（你可以使用任何你喜欢的检测器和匹配器，我保留了 FLANN 但使用 ORB 进行检测，因为 SIFT 不是' t 在 4.2.0 的 OpenCV 主版本中)
用findFundamentalMat
使用 stereoRectifyUncalibrated 和 warpPerspective
用StereoSGBM

结果好多了：

与 ORB 和 FLANN 匹配

未失真的图像（左，然后右）

差距

立体声BM

此结果看起来与 OP 问题相似（斑点、间隙、某些区域的深度错误）。

StereoSGBM（已调整）

这个结果看起来好多了，并且使用与 OP 大致相同的方法，减去最终的差异计算，让我认为 OP 会在他的图像上看到类似的改进，如果提供的话。

Post过滤

OpenCV 文档中有 a good article about this。如果您需要非常平滑的地图，我建议您查看它。

上面的示例照片是 MPI Sintel Dataset 中场景 ambush_2 的第 1 帧。

完整代码（在 OpenCV 4.2.0 上测试）：

import cv2
import numpy as np
import matplotlib.pyplot as plt

imgL = cv2.imread("tsukuba_l.png", cv2.IMREAD_GRAYSCALE)  # left image
imgR = cv2.imread("tsukuba_r.png", cv2.IMREAD_GRAYSCALE)  # right image


def get_keypoints_and_descriptors(imgL, imgR):
    """Use ORB detector and FLANN matcher to get keypoints, descritpors,
    and corresponding matches that will be good for computing
    homography.
    """
    orb = cv2.ORB_create()
    kp1, des1 = orb.detectAndCompute(imgL, None)
    kp2, des2 = orb.detectAndCompute(imgR, None)

    ############## Using FLANN matcher ##############
    # Each keypoint of the first image is matched with a number of
    # keypoints from the second image. k=2 means keep the 2 best matches
    # for each keypoint (best matches = the ones with the smallest
    # distance measurement).
    FLANN_INDEX_LSH = 6
    index_params = dict(
        algorithm=FLANN_INDEX_LSH,
        table_number=6,  # 12
        key_size=12,  # 20
        multi_probe_level=1,
    )  # 2
    search_params = dict(checks=50)  # or pass empty dictionary
    flann = cv2.FlannBasedMatcher(index_params, search_params)
    flann_match_pairs = flann.knnMatch(des1, des2, k=2)
    return kp1, des1, kp2, des2, flann_match_pairs


def lowes_ratio_test(matches, ratio_threshold=0.6):
    """Filter matches using the Lowe's ratio test.

    The ratio test checks if matches are ambiguous and should be
    removed by checking that the two distances are sufficiently
    different. If they are not, then the match at that keypoint is
    ignored.

    https://whosebug.com/questions/51197091/how-does-the-lowes-ratio-test-work
    """
    filtered_matches = []
    for m, n in matches:
        if m.distance < ratio_threshold * n.distance:
            filtered_matches.append(m)
    return filtered_matches


def draw_matches(imgL, imgR, kp1, des1, kp2, des2, flann_match_pairs):
    """Draw the first 8 mathces between the left and right images."""
    # https://docs.opencv.org/4.2.0/d4/d5d/group__features2d__draw.html
    # https://docs.opencv.org/2.4/modules/features2d/doc/common_interfaces_of_descriptor_matchers.html
    img = cv2.drawMatches(
        imgL,
        kp1,
        imgR,
        kp2,
        flann_match_pairs[:8],
        None,
        flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS,
    )
    cv2.imshow("Matches", img)
    cv2.imwrite("ORB_FLANN_Matches.png", img)
    cv2.waitKey(0)


def compute_fundamental_matrix(matches, kp1, kp2, method=cv2.FM_RANSAC):
    """Use the set of good mathces to estimate the Fundamental Matrix.

    See  https://en.wikipedia.org/wiki/Eight-point_algorithm#The_normalized_eight-point_algorithm
    for more info.
    """
    pts1, pts2 = [], []
    fundamental_matrix, inliers = None, None
    for m in matches[:8]:
        pts1.append(kp1[m.queryIdx].pt)
        pts2.append(kp2[m.trainIdx].pt)
    if pts1 and pts2:
        # You can play with the Threshold and confidence values here
        # until you get something that gives you reasonable results. I
        # used the defaults
        fundamental_matrix, inliers = cv2.findFundamentalMat(
            np.float32(pts1),
            np.float32(pts2),
            method=method,
            # ransacReprojThreshold=3,
            # confidence=0.99,
        )
    return fundamental_matrix, inliers, pts1, pts2


############## Find good keypoints to use ##############
kp1, des1, kp2, des2, flann_match_pairs = get_keypoints_and_descriptors(imgL, imgR)
good_matches = lowes_ratio_test(flann_match_pairs, 0.2)
draw_matches(imgL, imgR, kp1, des1, kp2, des2, good_matches)


############## Compute Fundamental Matrix ##############
F, I, points1, points2 = compute_fundamental_matrix(good_matches, kp1, kp2)


############## Stereo rectify uncalibrated ##############
h1, w1 = imgL.shape
h2, w2 = imgR.shape
thresh = 0
_, H1, H2 = cv2.stereoRectifyUncalibrated(
    np.float32(points1), np.float32(points2), F, imgSize=(w1, h1), threshold=thresh,
)

############## Undistort (Rectify) ##############
imgL_undistorted = cv2.warpPerspective(imgL, H1, (w1, h1))
imgR_undistorted = cv2.warpPerspective(imgR, H2, (w2, h2))
cv2.imwrite("undistorted_L.png", imgL_undistorted)
cv2.imwrite("undistorted_R.png", imgR_undistorted)

############## Calculate Disparity (Depth Map) ##############

# Using StereoBM
stereo = cv2.StereoBM_create(numDisparities=16, blockSize=15)
disparity_BM = stereo.compute(imgL_undistorted, imgR_undistorted)
plt.imshow(disparity_BM, "gray")
plt.colorbar()
plt.show()

# Using StereoSGBM
# Set disparity parameters. Note: disparity range is tuned according to
#  specific parameters obtained through trial and error.
win_size = 2
min_disp = -4
max_disp = 9
num_disp = max_disp - min_disp  # Needs to be divisible by 16
stereo = cv2.StereoSGBM_create(
    minDisparity=min_disp,
    numDisparities=num_disp,
    blockSize=5,
    uniquenessRatio=5,
    speckleWindowSize=5,
    speckleRange=5,
    disp12MaxDiff=2,
    P1=8 * 3 * win_size ** 2,
    P2=32 * 3 * win_size ** 2,
)
disparity_SGBM = stereo.compute(imgL_undistorted, imgR_undistorted)
plt.imshow(disparity_SGBM, "gray")
plt.colorbar()
plt.show()