使用 google mlkit 视觉样本减少跟踪 window

Question

我想在使用 google 视觉 api 时减少减少条形码跟踪 window。有一些答案，但感觉有点过时了。

我正在使用 google 的样本：https://github.com/googlesamples/mlkit/tree/master/android/vision-quickstart

目前，我试图在 BarcodeScannerProcessor onSuccess 回调中找出条形码是否在我的覆盖框中：

override fun onSuccess(barcodes: List<Barcode>, graphicOverlay: GraphicOverlay) {
    if(barcodes.isEmpty())
      return;

    for(barcode in barcodes) {
      val center = Point(graphicOverlay.imageWidth / 2, graphicOverlay.imageHeight / 2)
      val rectWidth = graphicOverlay.imageWidth * Settings.OverlayWidthFactor
      val rectHeight = graphicOverlay.imageHeight * Settings.OverlayHeightFactor

      val left = center.x - rectWidth / 2
      val top = center.y - rectHeight / 2
      val right = center.x + rectWidth / 2
      val bottom = center.y + rectHeight / 2

      val rect = Rect(left.toInt(), top.toInt(), right.toInt(), bottom.toInt())

      val contains = rect.contains(barcode.boundingBox!!)
      val color = if(contains) Color.GREEN else Color.RED

      graphicOverlay.add(BarcodeGraphic(graphicOverlay, barcode, "left: ${barcode.boundingBox!!.left}", color))
    }
}

Y-wise 它完美地工作，但是来自 barcode.boundingBox 的 X 值例如barcode.boundingBox.left 似乎有偏移。它是基于 GraphicOverlay 中的计算结果吗？

我希望下面的值接近 0，但此处的偏移量约为 90：

或者根据框来裁剪图像效率更高？

Answer 1

实际上边界框是正确的。诀窍是图像纵横比与视口纵横比不匹配，因此图像被水平裁剪。尝试打开设置（右上角的齿轮）并选择合适的分辨率。

例如看看这两个截图。在第一个上，所选分辨率 (1080x1920) 与我的 phone 分辨率匹配，因此填充看起来不错 (17px)。在第二个屏幕截图中，宽高比不同（720x720 分辨率为 1.0）因此图像被裁剪并且填充看起来不正确。

所以偏移量应该从图像坐标转换为屏幕坐标。在引擎盖下 GraphicOverlay 使用矩阵进行此转换。您可以使用相同的矩阵：

    for(barcode in barcodes) {
      barcode.boundingBox?.let { bbox ->
        val offset = floatArrayOf(bbox.left.toFloat(), bbox.top.toFloat())
        graphicOverlay.transformationMatrix.mapPoints(offset)

        val leftOffset = offset[0]
        val topOffset = offset[1]

        ...
      }
    }

唯一的问题是 transformationMatrix 是私有的，所以你应该添加一个 getter 来访问它。

Answer 2

如您所知，相机的预览尺寸可在设置菜单中配置。此可配置大小指定 graphicOverlay 尺寸。

另一方面，显示在屏幕上的CameraSourcePreview（即 preview_view in activity_vision_live_preview.xml）的纵横比，不一定等于graphicOverlay的比值。因为取决于 phone 的屏幕大小和父级 ConstraintLayout 允许占用的高度。

因此，在预览中，根据 graphicOverlay 和 preview_view 的纵横比差异，graphicOverlay 的某些部分可能无法水平或垂直显示。

里面有一些参数GraphicOverlay可以帮助我们调整条形码boundingBox的左上角，使可见区域从0开始。

首先，它们应该可以从 GraphicOverlay class 中访问。所以，给他们写一个getter方法就够了：

GraphicOverlay.java

public class GraphicOverlay extends View {
    
    ...

    /**
     * The factor of overlay View size to image size. Anything in the image coordinates need to be
     * scaled by this amount to fit with the area of overlay View.
     */
    public float getScaleFactor() {
        return scaleFactor;
    }

    /**
     * The number of vertical pixels needed to be cropped on each side to fit the image with the
     * area of overlay View after scaling.
     */
    public float getPostScaleHeightOffset() {
        return postScaleHeightOffset;
    }

    /**
     * The number of horizontal pixels needed to be cropped on each side to fit the image with the
     * area of overlay View after scaling.
     */
    public float getPostScaleWidthOffset() {
        return postScaleWidthOffset;
    }
}

现在，可以使用如下参数计算左侧和顶部的差异间隙：

BarcodeScannerProcessor.kt

class BarcodeScannerProcessor(
    context: Context
) : VisionProcessorBase<List<Barcode>>(context) {

    ...

    override fun onSuccess(barcodes: List<Barcode>, graphicOverlay: GraphicOverlay) {
        if (barcodes.isEmpty()) {
            Log.v(MANUAL_TESTING_LOG, "No barcode has been detected")
        }

        val leftDiff = graphicOverlay.run { postScaleWidthOffset / scaleFactor }.toInt()
        val topDiff = graphicOverlay.run { postScaleHeightOffset / scaleFactor }.toInt()

        for (i in barcodes.indices) {
            val barcode = barcodes[i]
            val color = Color.RED
            val text = "left: ${barcode.boundingBox!!.left - leftDiff}   top: ${barcode.boundingBox!!.top - topDiff}"
            graphicOverlay.add(MyBarcodeGraphic(graphicOverlay, barcode, text, color))
            logExtrasForTesting(barcode)
        }
    }

    ...
}

视觉结果：

这是输出的可视化结果。从图中可以看出，barcode的left & top和可见区域的left and top之间的间距是从0开始的。在左图的情况下，graphicOverlay是设置为 480x640 的大小（纵横比 ≈ 1.3334），右边的 360x640（纵横比 ≈ 1.7778）。在这两种情况下，在我的 phone 上，CameraSourcePreview 的大小稳定为 1440x2056 像素（长宽比 ≈ 1.4278），因此这意味着计算真正反映了 barcode在可见区域。

(注意一个实验可见区域的纵横比比graphicOverlay低，另一个实验更大：1.3334 < 1.4278 < 1.7778 所以，left值和top值分别调整。)

使用 google mlkit 视觉样本减少跟踪 window

Reduce tracking window using google mlkit vision samples

android

kotlin

google-vision

google-mlkit

视觉结果：