没有模板的特定参数的优化

Question

我运行进入了一些优化的代码，速度很快，但它让我的代码变得丑陋。

一个最小的例子如下：


enum class Foo : char {
    A = 'A',
    B = 'B'
};

struct A_t {
    constexpr operator Foo() const { return Foo::A; }
};

void function_v1(Foo s){
   if(s == Foo::A){
      //Run special version of the code
   } else {
      //Run other version of the code
   }
}

template<class foo_t>
void function_v2(foo_t s){
   if(s == Foo::A){
      //Run special version of the code
   } else {
      //Run other version of the code
   }
}

int main(){

   // Version 1 of the function, simple call, no template
   function_v1(Foo::A);

   // Version 2 of the function, templated, but call is still simple
   function_v2(Foo::A);

   // Version 2 of the function, the argument is now not of type Foo, but of type A_t
   const A_t a; 
   function_v2(a);

}

最后一个函数调用 function_v2 将使用 A_t 的特定版本进行实例化。这可能不利于可执行文件的大小，但在实验中，我注意到编译器能够识别 switch == Foo::A 将始终评估为 true 并且检查被优化掉了。使用 gcc，即使使用 -O3.

，此检查在其他版本中也未优化

我正在开发一个性能极其密集的应用程序，因此此类优化很重要。不过，我不喜欢function_v2的风格。为了防止使用错误的类型调用函数，我必须执行 enable_if 之类的操作以确保不会使用错误的类型调用函数。它使自动完成复杂化，因为类型现在是模板化的。现在用户需要记住使用特定类型的变量而不是枚举值来调用函数。

有没有一种方法可以按照function_v1的风格编写函数，但仍然让编译器进行不同的实例化？也许编码风格略有不同？或者代码中的编译器提示？或者一些编译器标志将使编译器更有可能进行多个实例化？

Answer 1

Is there a way to write a function in the style of function_v1, but still have the compiler make different instantiations?

如果我们稍微扩展您的示例以更好地揭示编译器的行为：

enum class Foo : char {
    A = 'A',
    B = 'B'
};

struct A_t {
    constexpr operator Foo() const { return Foo::A; }
};

void foo();
void bar();

void function_v1(Foo s){
   if(s == Foo::A){
      foo();
   } else {
      bar();
   }
}

template<class foo_t>
void function_v2(foo_t s){
   if(s == Foo::A){
      foo();
   } else {
      bar();
   }
}

void test1(){
   function_v1(Foo::A);
}

void test2(){
   function_v2(Foo::A);
}

void test3(){
   const A_t a; 
   function_v2(a);
}

并用-O3编译，我们得到：

test1(): # @test1()
  jmp foo() # TAILCALL
test2(): # @test2()
  jmp foo() # TAILCALL
test3(): # @test3()
  jmp foo() # TAILCALL

参见 godbolt.org：https://gcc.godbolt.org/z/443TqcczW

test1()、test2() 和 test3() 的最终程序集完全相同！这是怎么回事？

在 function_v2() 中被优化掉的 if 与它是一个模板无关，而是它在 header 中定义的事实（这是必要的用于模板），完整的实现在调用站点可见。

要获得与 function_v1() 相同的好处，您只需在 header 中定义函数并将其标记为 inline 以避免违反 ODR。您将有效地获得与 function_v2().

中发生的完全相同的优化

尽管如此，所有这些给你的都是等价的。如果你想要保证，你应该在编译时强制提供值，作为模板参数：

template<Foo s>
void function_v3() {
    if constexpr (s == Foo::A) {
        foo();
    }
    else {
        bar();
    }
}

// usage:

function_v3<Foo::A>();

如果您仍然需要该函数的 runtime-evaluated 版本，您可以按照以下方式做一些事情：

decltype(auto) function_v3(Foo s) {
    switch(s) {
        case Foo::A: 
            return function_v3<Foo::A>();
        case Foo::B: 
            return function_v3<Foo::B>();
    }
}

// Forced compile-time switch
function_v3<Foo::A>();

// At the mercy of the optimizer.
function_v3(some_val);

Answer 2

如何使用模板专业化：

template<class T>
void function_v2_other(T s){
    //Run other version of the code
}

template<class T>
void function_v2(T s){
   function_v2_other(s);
}

template<>
void function_v2(Foo s){
   if(s == Foo::A){
      //Run special version of the code
   } else {
      function_v2_other(s);
   }
}

没有模板的特定参数的优化

Optimization for specific argument without template

c++

compiler-optimization