解释 visual studio profiler，这个减法是不是很慢？我能让这一切变得更快吗？

Question

我是第一次使用 Visual Studio 探查器，我正在尝试解释结果。看左边的百分比，我发现这个减法的时间有点耗费st运行ge:

代码的其他部分包含更复杂的表达式，例如：

即使是简单的乘法似乎也比减法快得多：

其他乘法需要更长的时间，我真的不明白为什么，就像这样：

所以，我想我的问题是这里是否发生了什么奇怪的事情。

复杂的表达式比减法花费的时间更长，并且某些表达式比其他类似的表达式花费的时间更长。我运行多次分析器，百分比的分布总是这样。我只是解释错了吗？

更新：

我被要求提供整个功能的配置文件，所以就在这里，尽管它有点大。我运行 for 循环中的函数运行了 1 分钟，得到了 50k 个样本。该函数包含一个双循环。为了方便起见，我首先包含文本，然后是分析图片。请注意，文本中的代码有点更新。

 for (int i = 0; i < NUMBER_OF_CONTOUR_POINTS; i++) {

    vec4 contourPointV(contour3DPoints[i], 1);
    float phi = angles[i];

    float xW = pose[0][0] * contourPointV.x + pose[1][0] * contourPointV.y + contourPointV.z * pose[2][0] + pose[3][0];
    float yW = pose[0][1] * contourPointV.x + pose[1][1] * contourPointV.y + contourPointV.z * pose[2][1] + pose[3][1];
    float zW = pose[0][2] * contourPointV.x + pose[1][2] * contourPointV.y + contourPointV.z * pose[2][2] + pose[3][2];

    float x = -G_FU_STRICT * xW / zW;
    float y = -G_FV_STRICT * yW / zW;
    x = (x + 1) * G_WIDTHo2;
    y = (y + 1) * G_HEIGHTo2;
    y = G_HEIGHT - y;



    phi -= extraTheta;
    if (phi < 0)phi += CV_PI2;
    int indexForTable = phi * oneKoverPI;
    //vec2 ray(cos(phi), sin(phi));
    vec2 ray(cos_pre[indexForTable], sin_pre[indexForTable]);
    vec2 ray2(-ray.x, -ray.y);
    float outerStepX = ray.x * step;
    float outerStepY = ray.y * step;
    cv::Point2f outerPoint(x + outerStepX, y + outerStepY);
    cv::Point2f innerPoint(x - outerStepX, y - outerStepY);
    cv::Point2f contourPointCV(x, y);
    cv::Point2f contourPointCVcopy(x, y);

    bool cut = false;
    if (!isInView(outerPoint.x, outerPoint.y) || !isInView(innerPoint.x, innerPoint.y)) {
        cut = true;
    }
    bool outside2 = true; bool outside1 = true;

    if (cut) {
        outside2 = myClipLine(contourPointCV.x, contourPointCV.y, outerPoint.x, outerPoint.y, G_WIDTH - 1, G_HEIGHT - 1);
        outside1 = myClipLine(contourPointCVcopy.x, contourPointCVcopy.y, innerPoint.x, innerPoint.y, G_WIDTH - 1, G_HEIGHT - 1);
    }


    myIterator innerRayMine(contourPointCVcopy, innerPoint);
    myIterator outerRayMine(contourPointCV, outerPoint);

    if (!outside1) {
        innerRayMine.end = true;
        innerRayMine.prob = true;
    }
    if (!outside2) {
        outerRayMine.end = true;
        innerRayMine.prob = true;
    }



    vec2 normal = -ray;
    float dfdxTerm = -normal.x;
    float dfdyTerm = normal.y;
    vec3 point3D = vec3(xW, yW, zW);
    cv::Point contourPoint((int)x, (int)y);



    float Xc = point3D.x; float Xc2 = Xc * Xc; float Yc = point3D.y; float Yc2 = Yc * Yc; float Zc = point3D.z; float Zc2 = Zc * Zc;
    float XcYc = Xc * Yc; float dfdxFu = dfdxTerm * G_FU; float dfdyFv = dfdyTerm * G_FU; float overZc2 = 1 / Zc2; float overZc = 1 / Zc;
    pixelJacobi[0] = (dfdyFv * (Yc2 + Zc2) + dfdxFu * XcYc) * overZc2;
    pixelJacobi[1] = (-dfdxFu * (Xc2 + Zc2) - dfdyFv * XcYc) * overZc2;
    pixelJacobi[2] = (-dfdyFv * Xc + dfdxFu * Yc) * overZc;
    pixelJacobi[3] = -dfdxFu * overZc;
    pixelJacobi[4] = -dfdyFv * overZc;
    pixelJacobi[5] = (dfdyFv * Yc + dfdxFu * Xc) * overZc2;


    float commonFirstTermsSum = 0;
    float commonFirstTermsSquaredSum = 0;

    int test = 0;
    while (!innerRayMine.end) {

        test++;
        cv::Point xy = innerRayMine.pos(); innerRayMine++;
        int x = xy.x;
        int y = xy.y;
        float dx = x - contourPoint.x;
        float dy = y - contourPoint.y;
        vec2 dxdy(dx, dy);

        float raw = -glm::dot(dxdy, normal);
        float heavisideTerm = heaviside_pre[(int)raw * 100 + 1000];
        float deltaTerm = delta_pre[(int)raw * 100 + 1000];


        const Vec3b rgb = ante[y * 640 + x];
        int red = rgb[0]; int green = rgb[1]; int blue = rgb[2];
        red = red >> 3; red = red << 10; green = green >> 3; green = green << 5; blue = blue >> 3;
        int colorIndex = red + green + blue;

        pF = pFPointer[colorIndex];
        pB = pBPointer[colorIndex];
        float denAsMul = 1 / (pF + pB + 0.000001);
        pF = pF * denAsMul;

        float pfMinusPb = 2 * pF - 1;
        float denominator = heavisideTerm * (pfMinusPb)+pB + 0.000001;
        float commonFirstTerm = -pfMinusPb / denominator * deltaTerm;

        commonFirstTermsSum += commonFirstTerm;
        commonFirstTermsSquaredSum += commonFirstTerm * commonFirstTerm;

    }
}

Answer 1

Visual Studio profiles by sampling：经常中断执行，记录指令指针的值；然后它将它映射到源并计算命中该行的频率。

这有一些问题：在优化代码中并不总是能够找出哪一行产生了特定的汇编指令。

我使用的一个技巧是将感兴趣的代码移到一个单独的函数中，并用 __declspec(noinline) 声明它。

在您的示例中，您确定减法执行的次数与乘法执行的次数一样多吗？我会更困惑的是后续乘法的差异（0.39%和0.53%）

更新：

我相信以下几行：

float phi = angles[i];

和

phi -= extraTheta;

在组装中一起移动，angles[i] 所花费的时间被添加到该减法线。

解释 visual studio profiler，这个减法是不是很慢？我能让这一切变得更快吗？

Interpreting visual studio profiler, is this subtraction slow? Can I make all this any faster?

c++

performance

profiling