在视频处理中，z坐标是帧数吗？

Question

我正在尝试使用 KTH 数据集执行一些基本的动作识别。

我正在使用来自 UCF link 的 3DSIFT 特征提取器。它从给定的 x、y 和 z 坐标中提取 SIFT 描述符。

对于特征检测，我使用选择性 STIPS link，它已被证明对动作识别非常有效。根据作者提供的源码，输出结果如下：

    @output : corner_points, P X 4 matrix, where P is the number of interest
%           point found in the image_stack and each interest point contains
%           4 values :: [X,Y] coordinate of the interest point, frame
%           number, scale at which it is detected.

我可以假设这里提供的帧数也是3DSIFT 要求的Z 坐标吗？

我从视频剪辑中提取了 STIPS 并获得了所需的输出，但我在每一帧上都得到了多个 X 和 Y 值：

[71,24,1]
[54,26,1]
[86,29,1]
...
..
.

这是 SIFT3D 的预期输出和接受的输入吗？

Answer 1

是的，据我所知，通过 3dsift Z 相当于处理视频时的帧数。因此，stips 的 x、y、帧输出应作为 3dsift 的 x、y、z 输入。

在视频处理中，z坐标是帧数吗？

In video processing, is the z-coordinate the frame number?

matlab

video-processing

feature-extraction

feature-detection