如何按 table 边框对图像进行切片

Question

我有很多这样的 png 文件:

我想将图像分成 48 (=6x8) 个小图像文件，用于由 table 边框分隔的 48 个单元格。也就是说，我想要文件 img11.png, ..., img68.png，其中 img11.png 包含 (1,1) "1.4x4x8" 单元格，img12.png (1,2)“M/T”单元格，img13.png“550,000”单元格，...，img68.png右下角“641,500”单元格。

我想这样做是因为我认为它会提高 tesseract 的性能，这并不令人满意，因为我的许多图像文件的质量比上面显示的要差得多。此外，页边距和大小各不相同，有些图像包含非英文字符和图像。

是否有软件包可以检测table 边界并将图像切成 m x n 图像？我是这个领域的新手。我已经阅读了，但这超出了我的能力范围。不过我愿意学习。

感谢您的帮助。

Answer 1

我正在使用 R. Bilal 的建议（谢谢）让我得到以下结果。

第 1 步：将图像转换为灰度。

library(magick)
x <- image_read('https://i.stack.imgur.com/plBvs.png')
y <- image_convert(x, colorspace='Gray')
a <- as.integer(y[[1]])[,,1]

第 2 步：将“暗”转换为 1，将“亮”转换为 0。

w <- ifelse(a>190, 0, 1)         # adjust 190

第 3 步：检测水平线和垂直线。

ypos <- which(rowMeans(w) > .95)  # adjust .95
xpos <- which(colMeans(w) > .95)  # adjust .95

第 4 步：裁剪原始图像 (x)。

xpos <- c(0,xpos, ncol(a))
ypos <- c(0,ypos, nrow(a))

outdir <- "cropped"
dir.create(outdir)
m <- 0
for (i in 1:(length(ypos)-1)) {
  dy <- ypos[i+1]-ypos[i]
  n <- 0
  if (dy < 16) next  # skip if too short
  m <- m+1
  for (j in 1:(length(xpos)-1)) {
    dx <- xpos[j+1]-xpos[j]
    if (dx < 16) next  # skip if too narrow
    n <- n+1
    geom <- sprintf("%dx%d+%d+%d", dx, dy, xpos[j], ypos[i])
    # cat(sprintf('%2d %2d: %s\n', m, n, geom))
    cropped <- image_crop(x, geom)
    outfile <- file.path(outdir, sprintf('%02d_%02d.png', m, n))
    image_write(cropped, outfile, format="png")
  }
}

裁剪后的 (1,1) 图像为。

如何按 table 边框对图像进行切片

How to slice an image by table border

png

image-processing