识别图像中没有可见轮廓的 table 的边框和列轮廓

Question

我有一组图像，每个图像包含一个 table。有些图像中的 table 已经对齐并绘制了边框，使用 Canny 边缘检测不难识别这些图像上的主要 table。但是，某些图像的 table 没有任何边框，因此我试图识别图像中的 table 并绘制其边框的轮廓和列。

我使用的是 openCV 3.4 版，我通常采用的方法如下：

扩大灰度图像以识别文本点
应用cv2.findContours函数获取文本的边界框。
将边界框聚类，以防较小的 table 被识别而不是主要的 table。
试着画出等高线，希望能认出table的边界。

这种方法似乎有一定的作用，但画出的轮廓并不准确。

    img, contours, hierarchy = cv2.findContours(gray_matrix, cv2.RETR_LIST, 
    cv2.CHAIN_APPROX_SIMPLE)

    # get bounding boxes around any text
    boxes = []
    for contour in contours:
        box = cv2.boundingRect(contour)
        h = box[3]

    rows = {}
    cols = {}

    # Clustering the bounding boxes by their positions
    for box in boxes:
        (x, y, w, h) = box
        col_key = 10 # cell threshold
        row_key = 10 # cell threshold
        cols[row_key] = [box] if col_key not in cols else cols[col_key] + [box]
        rows[row_key] = [box] if row_key not in rows else rows[row_key] + [box]

    # Filtering out the clusters having less than 4 cols
    table_cells = list(filter(lambda r: len(r) >= 4, rows.values()))
    # Sorting the row cells by x coord
    table_cells = [list(sorted(tb)) for tb in table_cells]

    table_cells = list(sorted(table_cells, key=lambda r: r[0][1]))

    #attempt to identify columns

    max_last_col_width_row = max(table_cells, key=lambda b: b[-1][2])
    max_x = max_last_col_width_row[-1][0] + max_last_col_width_row[-1][2]

    ver_lines = []

    for box in table_cells:
        x = box[0][0]
        y = box[0][1]
        hor_lines.append((x, y, max_x, y))

    for box in table_cells[0]:
        x = box[0]
        y = box[1]
        ver_lines.append((x, y, x, max_y))

    (x, y, w, h) = table_cells[0][-1]
    ver_lines.append((max_x, y, max_x, max_y))
    (x, y, w, h) = table_cells[0][0]
    hor_lines.append((x, max_y, max_x, max_y))

    for line in ver_lines:
        [x1, y1, x2, y2] = line
    cv2.line(output_image, (x1, y1), (x2, y2), (0, 0, 255), 1)

    cv2.imshow('Proper Table Borders', output_image)

我正在尝试实现类似下图的效果。

简而言之，如何在图像中找到 table 结构的不可见边界以及识别已识别的 table 列的 x 坐标？

我知道上面的代码根本不能产生所需的结果，但我仍在学习 openCV，所以我尝试了各种方法但仍然没有达到预期的结果。

Answer 1

尝试垂直剖面，它是在特定 (Y0, Y1) 范围内（table 垂直跨度）具有相同 X 坐标的文本（黑色）像素的计数。零或接近零的区域将指示 table 列边界。这是您的示例的手绘概况：

识别图像中没有可见轮廓的 table 的边框和列轮廓

Identify borders and column contours of table that has no visible outline within an image

python

opencv

computer-vision

opencv-contour