我可以说服 GCC 通过存储的函数指针内联延迟调用吗？

Question

自然地，当内部函数调用在该范围内是直接已知的时，C++ 编译器可以内联从函数模板内进行的函数调用 (ref)。

#include <iostream>

void holyheck()
{
   std::cout << "!\n";
}

template <typename F>
void bar(F foo)
{
   foo();
}

int main()
{
   bar(holyheck);
}

现在，如果我将 holyheck 传递给 class，它存储函数指针（或等效指针）并稍后调用它会怎样？我有希望得到这个内联吗？怎么样？

template <typename F>
struct Foo
{
   Foo(F f) : f(f) {};
   void calledLater() { f(); }

private:
   F f;
};

void sendMonkeys();
void sendTissues();

int main()
{
   Foo<void(*)()> f(sendMonkeys);
   Foo<void(*)()> g(sendTissues);
   // lots of interaction with f and g, not shown here
   f.calledLater();
   g.calledLater();
}

我的类型 Foo 旨在隔离大量逻辑；它将被实例化几次。从 calledLater 调用的特定函数是实例化之间唯一不同的东西（尽管它在 Foo 的生命周期中永远不会改变），所以 Foo 的一半目的是遵守干燥的。（它的其余目的是使该机制与其他代码隔离。）

但我不想在这样做时引入实际额外函数调用的开销，因为这一切都发生在程序瓶颈中。

我不会说 ASM，所以分析编译后的代码对我来说用处不大。
我的直觉是我没有机会在这里内联。

Answer 1

如果你真的不需要使用函数指针，那么仿函数应该使优化变得微不足道：

struct CallSendMonkeys {
  void operator()() {
    sendMonkeys();
  }
};
struct CallSendTissues {
  void operator()() {
    sendTissues();
  }
};

（当然，C++11 有 lambda，但你标记了你的问题 C++03。）

通过 Foo 与这些类的不同实例化，并且在这些类中没有内部状态，f() 不依赖于 f 被构造，所以如果编译器不能判断它保持未修改，这不是问题。

Answer 2

以你的例子为例，经过摆弄使其编译后看起来像这样：

template <typename F>
struct Foo
{
   Foo(F f) : f(f) {};
   void calledLater() { f(); }

private:
   F f;
};

void sendMonkeys();
void sendTissues();

int main()
{
    Foo<__typeof__(&sendMonkeys)> f(sendMonkeys);
    Foo<__typeof__(&sendTissues)> g(sendTissues);
   // lots of interaction with f and g, not shown here
   f.calledLater();
   g.calledLater();
}

clang++（几周前的 3.7，这意味着我希望 clang++3.6 能够执行此操作，因为它在源代码库中仅早几周）生成此代码：

    .text
    .file   "calls.cpp"
    .globl  main
    .align  16, 0x90
    .type   main,@function
main:                                   # @main
    .cfi_startproc
# BB#0:                                 # %entry
    pushq   %rax
.Ltmp0:
    .cfi_def_cfa_offset 16
    callq   _Z11sendMonkeysv
    callq   _Z11sendTissuesv
    xorl    %eax, %eax
    popq    %rdx
    retq
.Ltmp1:
    .size   main, .Ltmp1-main
    .cfi_endproc

当然，如果没有 sendMonkeys 和 sendTissues 的定义，我们就无法真正进一步内联。

如果我们这样实现它们：

void request(const char *);
void sendMonkeys() { request("monkeys"); }
void sendTissues() { request("tissues"); }

汇编代码变为：

main:                                   # @main
    .cfi_startproc
# BB#0:                                 # %entry
    pushq   %rax
.Ltmp2:
    .cfi_def_cfa_offset 16
    movl    $.L.str, %edi
    callq   _Z7requestPKc
    movl    $.L.str1, %edi
    callq   _Z7requestPKc
    xorl    %eax, %eax
    popq    %rdx
    retq

.L.str:
    .asciz  "monkeys"
    .size   .L.str, 8

    .type   .L.str1,@object         # @.str1
.L.str1:
    .asciz  "tissues"
    .size   .L.str1, 8

其中，如果您无法阅读汇编代码，则 request("tissues") 和 request("monkeys") 按预期内联。

我只是对 g++ 4.9.2 感到惊讶。不做同样的事情（我已经走到这一步并希望继续 "and g++ does the same, I'm not going to post the code for it"）。 [它会内联 sendTissues 和 sendMonkeys，但不会进行下一步内联 request]

当然，完全有可能对此进行微小的更改而不内联代码 - 例如添加一些依赖于编译器无法在编译时确定的变量的条件。

编辑： 我确实向 Foo 添加了一个字符串和一个整数，并使用外部函数更新了它们，此时内联对 clang 和 gcc 都消失了。只使用一个整数并调用一个外部函数，它确实内联了代码。

换句话说，这实际上取决于该部分中的代码 // lots of interaction with f and g, not shown here。而且我认为您（Lightness）已经在这里呆了足够长的时间，知道对于 80% 以上的问题，问题中未发布的代码才是实际答案最重要的部分；）

Answer 3

要使您原来的方法奏效，请使用

template< void(&Func)() >
struct Foo
{
    void calledLater() { Func(); }
};

总的来说，我更幸运 gcc 通过使用函数引用而不是函数指针来内联事物。

我可以说服 GCC 通过存储的函数指针内联延迟调用吗？

Can I persude GCC to inline a deferred call through a stored function pointer?

c++

c++03