是否每个 C++ 成员函数都将“this”作为隐式输入？

Question

当我们在 C++ 中为 class 创建一个成员函数时，它有一个隐式的额外参数，它是一个指向调用对象的指针——称为 this。

这是否适用于任何函数，即使它不使用 this 指针。例如，给定 class

class foo
{
private:
    int bar;
public:
    int get_one()
    {
      return 1;  // Not using `this`
    }
    int get_bar()
    {
        return this->bar;  // Using `this`
    }
}

这两个函数（get_one 和 get_bar）是否会将 this 作为隐式参数，即使只有其中一个实际使用它？
这样做好像有点浪费。

_{注意：我理解正确的做法是使 get_one() 静态，答案可能取决于实现, 但我只是好奇。}

Answer 1

Would both of the functions (get_one and get_bar) take this as an implicit parameter even though only onle get_bar uses it?

是（除非编译器将其优化掉，但这仍然不意味着您可以在没有有效对象的情况下调用该函数）。

It seems like a bit of a waste to do so

那不使用会员数据，为什么是会员呢？有时，正确的方法是将其作为同一命名空间中的自由函数。

Answer 2

如果不使用this，则无法判断是否可用。所以实际上没有区别。这就像问一棵树倒在无人居住的森林中是否会发出声音。这简直是一个毫无意义的问题。

我可以告诉你：如果你想在成员函数中使用this，你可以。您始终可以使用该选项。

Answer 3

...class in c++, as I understand it, it has an implicit extra argument that is a pointer to the calling object

重要的是要注意，C++ 开始时是带有对象的 C。

为此，this 指针不是隐式存在于成员函数中的指针，而是成员函数在编译时需要一个知道 this 指的是什么的方法；因此，隐式 this 指向正在传入的调用对象的指针的概念。

换句话说，让我们把你的 C++ class 变成 C 版本：

C++

class foo
{
    private:
        int bar;
    public:
        int get_one()
        {
            return 1;
        }
        
        int get_bar()
        {
            return this->bar;
        }
    
        int get_foo(int i)
        {
            return this->bar + i;
        }
};

int main(int argc, char** argv)
{
    foo f;
    printf("%d\n", f.get_one());
    printf("%d\n", f.get_bar());
    printf("%d\n", f.get_foo(10));
    return 0;
}

C

typedef struct foo
{
    int bar;
} foo;

int foo_get_one(foo *this)
{
    return 1;
}

int foo_get_bar(foo *this)
{
    return this->bar;
}

int foo_get_foo(int i, foo *this)
{
    return this->bar + i;
}

int main(int argc, char** argv)
{
    foo f;
    printf("%d\n", foo_get_one(&f));
    printf("%d\n", foo_get_bar(&f));
    printf("%d\n", foo_get_foo(10, &f));
    return 0;
}

当编译和汇编 C++ 程序时，this 指针被“添加”到损坏的函数中，以便“知道”什么对象正在调用成员函数。

所以 foo::get_one 可能被“破坏”为 foo_get_one(foo *this) 的 C 等价物，foo::get_bar 可能被破坏为 foo_get_bar(foo *this) 而 foo::get_foo(int) 可能是 foo_get_foo(int, foo *this)，等等

Would both of the functions (get_one and get_bar) take this as an implicit parameter even though only one get_bar uses it? It seems like a bit of a waste to do so.

这是编译器的一个函数，如果绝对没有进行任何优化，启发式方法可能仍会消除不需要调用对象（以保存堆栈）的损坏函数中的 this 指针，但是这在很大程度上取决于代码及其编译方式和系统。

更具体地说，如果函数像 foo::get_one 一样简单（仅仅返回一个 1），编译器可能只是用常量 1 代替调用 object->get_one()，无需任何 references/pointers.

希望能帮到你。

Answer 4

语义上 this 指针在成员函数中始终可用 - 作为另一个用户。也就是说，您可以稍后更改函数以毫无问题地使用它（特别是，不需要在其他翻译单元中重新编译调用代码）或者在 virtual 函数的情况下，覆盖版本子类可以使用 this，即使基础实现没有。

所以剩下的有趣问题是性能影响，如果有的话。 caller and/or callee 可能会有成本，内联和未内联时成本可能不同。我们检查以下所有排列：

内联

在 inlined 的情况下，编译器可以看到调用站点和函数实现¹，所以大概看不到需要遵循任何特定的调用约定，因此隐藏 this 指针的成本应该消失。另请注意，在这种情况下，"callee" 代码和 "called" 代码之间没有真正的区别，因为它们在调用站点优化组合在一起。

让我们使用以下测试代码：

#include <stdio.h>

class foo
{
private:
    int bar;
public:
    int get_one_member()
    {
      return 1;  // Not using `this`
    }
};

int get_one_global() {
  return 2;
}

int main(int argc, char **) {
  foo f = foo();
  if(argc) {
    puts("a");
    return f.get_one_member();
  } else {
    puts("b");
    return get_one_global();
  }
}

请注意，两个 puts 调用只是为了使分支更加不同 - 否则编译器足够聪明，只使用条件 set/move，所以你不能甚至真的把两个函数的内联体分开了。

所有 gcc, icc and clang 都内联这两个调用并生成对成员和 non-member 函数等效的代码，在成员案例中没有任何 this 指针的踪迹。让我们看一下 clang 代码，因为它是最干净的：

main:
 push   rax
 test   edi,edi
 je     400556 <main+0x16>
 # this is the member case
 mov    edi,0x4005f4
 call   400400 <puts@plt>
 mov    eax,0x1
 pop    rcx
 ret
 # this is the non-member case    
 mov    edi,0x4005f6
 call   400400 <puts@plt>
 mov    eax,0x2
 pop    rcx
 ret

两条路径生成完全相同的 4 条指令系列，直至最后的 ret - puts 调用的两条指令，一条一条指令将mov 1 或2 的return 值转化为eax，并pop rcx 清理堆栈²。因此，在这两种情况下，实际调用都只执行了一条指令，并且根本没有 this 指针操作或传递。

越界

在out-of-line成本中，支持this指针实际上会产生一些real-but-generally-small成本，至少在调用方是这样。

我们使用类似的测试程序，但声明了成员函数 out-of-line 并禁用了这些函数的内联³:

class foo
{
private:
    int bar;
public:
    int __attribute__ ((noinline)) get_one_member();
};

int foo::get_one_member() 
{
   return 1;  // Not using `this`
}

int __attribute__ ((noinline)) get_one_global() {
  return 2;
}

int main(int argc, char **) {
  foo f = foo();
  return argc ? f.get_one_member() :get_one_global();
}

这个测试代码比上一个简单一些，因为它不需要 puts 调用来区分两个分支。

呼叫站点

让我们看看 gcc⁴ generates for main 的程序集（即在函数的调用点）：

main:
 test   edi,edi
 jne    400409 <main+0x9>
 # the global branch
 jmp    400530 <get_one_global()>
 # the member branch
 lea    rdi,[rsp-0x18]
 jmp    400520 <foo::get_one_member()>
 nop    WORD PTR cs:[rax+rax*1+0x0]
 nop    DWORD PTR [rax]

这里，两个函数调用实际上都是使用 jmp 实现的 - 这是一种 tail-call 优化，因为它们是 main 中最后调用的函数，所以 ret 用于被调用函数实际上 returns 到 main 的调用者 - 但这里成员函数的调用者付出了额外的代价：

lea    rdi,[rsp-0x18]

这就是将 this 指针加载到堆栈中的 rdi 中，它接收第一个参数，对于 C++ 成员函数，它是 this。所以有一个（小的）额外费用。

函数体

现在虽然 call-site 为传递（未使用的）this 指针付出了一些代价，但至少在这种情况下，实际函数体仍然同样有效：

foo::get_one_member():
 mov    eax,0x1
 ret    

get_one_global():
 mov    eax,0x2
 ret

两者都是由一个mov和一个ret组成。所以函数本身可以简单地忽略 this 值，因为它没有被使用。

这提出了一个问题，即这在一般情况下是否正确 - 不使用 this 的成员函数的函数体是否总是像等效的 non-member 函数一样高效地编译？

简短的回答是否 - 至少对于大多数在寄存器中传递参数的现代 ABI 是这样。 this 指针在调用约定中占用了一个参数寄存器，因此在编译成员函数时，您会更快地达到一个参数的 register-passed 个参数的最大数量。

以这个简单地将其六个 int 参数相加的函数为例：

int add6(int a, int b, int c, int d, int e, int f) {
  return a + b + c + d + e + f;
}

当使用 SysV ABI, you'll have to pass on register on the stack for the member function, resulting in code like this:

在 x86-64 平台上编译为成员函数时

foo::add6_member(int, int, int, int, int, int):
 add    esi,edx
 mov    eax,DWORD PTR [rsp+0x8]
 add    ecx,esi
 add    ecx,r8d
 add    ecx,r9d
 add    eax,ecx
 ret

注意从堆栈读取eax,DWORD PTR [rsp+0x8]，这通常会增加几个周期的延迟⁵和一条指令on gcc⁶ 与没有内存读取的 non-member 版本：

add6_nonmember(int, int, int, int, int, int):
 add    edi,esi
 add    edx,edi
 add    ecx,edx
 add    ecx,r8d
 lea    eax,[rcx+r9*1]
 ret

现在你不会通常有六个或更多的函数参数（尤其是非常短的，性能敏感的） - 但这至少表明即使在被调用者code-generation 方面，这个隐藏的 this 指针并不总是空闲的。

另请注意，虽然示例使用了 x86-64 codegen 和 SysV ABI，但相同的基本原则将适用于任何通过某些参数的 ABI条目在寄存器中。

¹ 请注意，此优化仅适用于有效的 non-virtual 函数 - 因为只有这样编译器才能知道实际的函数实现。

² 我猜测这就是它的用途 - 这会撤消方法顶部的 push rax 以便 rsp 在 return 上有正确的值，但我不知道为什么 push/pop 对首先需要在那里。其他编译器使用不同的策略，例如 add rsp, 8 和 sub rsp,8.

³ 实际上，你不会真的像这样禁用内联，但内联失败只是因为方法在不同的编译单元中。由于 Godbolt 的工作方式，我无法完全做到这一点，因此禁用内联具有相同的效果。

⁴ 奇怪的是，我无法让 clang 停止内联任一函数，无论是使用属性 noinline 还是使用 -fno-inline。

⁵ 事实上，通常比通常 L1-hit 延迟 4 个周期的英特尔，由于 store-forwarding 最近写入的值。

⁶ 原则上，至少在 x86 上，可以通过使用带有内存源操作数的 add 来消除 one-instruction 惩罚，而不是内存中的 mov 和随后的 reg-reg add 实际上 clang and icc 正是这样做的。我不认为一种方法占主导地位 - gcc 方法和单独的 mov 能够更好地将负载从关键路径上移开 - 尽早启动它然后仅在最后一条指令中使用它，而 icc 方法在涉及 mov 的关键路径上增加了 1 个周期，而 clang 方法似乎是最糟糕的 - 将所有添加串在一起形成 [=31 上的长依赖链=] 以读取的内存结束。

是否每个 C++ 成员函数都将“this”作为隐式输入？

Does every c++ member function take `this` as an input implicitly?

c++

performance

this

member-functions

language-lawyer

C++

C

内联

越界

呼叫站点

函数体