调用 string::c_str() 时实际做了什么？

Question

调用 string::c_str() 时实际做了什么？

string::c_str()会分配内存，复制字符串对象的内部数据，并在新分配的内存中追加一个空终止符？

或

因为 string::c_str() 必须是 O(1)，所以不再允许分配内存和复制 string。实际上，始终使用空终止符是唯一合理的实现方式。

有人在 this answer of this question 的评论中说 C++11 要求 std::string 为尾随 '[=17=]'[分配额外的 char。所以看来第二种选择是可能的。

和 another person says 即 std::string 操作 - 例如迭代、串联和元素变异——不需要零终止符。除非您将 string 传递给需要零终止字符串的函数，否则可以省略 .

还有更多voice from an expert:

Why is it common for implementers to make .data() and .c_str() do the same thing?

Because it is more efficient to do so. The only way to make .data() return something that is not null terminated, would be to have .c_str() or .data() copy their internal buffer, or to just use 2 buffers. Having a single null terminated buffer always means that you can always use just one internal buffer when implementing std::string.

所以我现在真的很困惑，调用 string::c_str() 时实际上做了什么？

更新:

如果 c_str() 实现为简单地返回已经分配和管理的指针。

A。由于 c_str() 必须以 null 结尾，因此内部缓冲区需要始终以 null 结尾，即使对于空的 std::string，例如：std::string demo_str;，也应该有一个 [= 24=]在demo_str的内存中。我说得对吗？

B。调用std::string::substr()会发生什么？自动将 [=24=] 附加到子字符串？

Answer 1

自 C++11 起，std::string::c_str() 和 std::string::data() 都需要 return 指向字符串内部缓冲区的指针。并且由于 c_str()（但不是 data()）必须以 null 终止，这实际上要求内部缓冲区始终以 null 终止，尽管 size()/[ 不计算 null 终止符=15=]，或 return 由 std::string 迭代器编辑，等等

在 C++11 之前，c_str() 的行为在技术上是特定于实现的，但我见过的大多数实现都是这样工作的，因为这是实现它的最简单和最明智的方法。 C++11 刚刚标准化了已经广泛使用的行为。

更新

自 C++11 起，缓冲区始终以 null 结尾，即使对于空字符串也是如此。但是，这并不意味着当字符串为空时需要动态分配缓冲区。它可以指向一个 SSO 缓冲区，甚至可以指向一个 static nul 字符。无法保证由 c_str()/data() 编辑的指针 return 仍然指向与字符串内容更改时相同的内存地址。

std::string::substr() returns 一个新的 std::string 有自己的空终止缓冲区。从中复制的字符串不受影响。

Answer 2

这是 .c_str() 的复杂度为 o(1) 的经验“证明”：

#include <stdio.h>
#include <string>
using namespace std;
int main(int argc, char **argv)
{
    std::string x(5000000, 'b'); // <--- single time allocation
    // std::string x(5, 'b'); // <--- compare to a much shorter string
    for (unsigned int i=0;i<1000000;i++)
    {
        const char *y = x.c_str(); // <--- copy entire content ?
    }
}

使用 -O0 编译以避免优化任何东西
计时 2 版本：我得到相同性能
这是一个经验“证明”（至少我机器的实现）
- 提取空终止字符串的内部表示
- 不会在每次调用 .c_str() 时都复制内容。

Answer 3

已经提供了很多很好的答案和评论。但是为了证明 std::string 通常由空终止字符串支持，我提供了一个简单但天真的实现。它不完整，不进行错误检查，当然也没有优化。但它足够完整，可以向您展示字符串 class 通常是如何使用空终止缓冲区作为成员变量来实现的。

class string
{
public:

    string()
    {
        assign("", 0);
    }

    string(const char* s)
    {
        assign(s, strlen(s));
    }

    string(const char* s, size_t len)
    {
        assign(s, len);
    }

    string(const string& s)
    {
        assign(s._ptr, s._len);
    }

    ~string()
    {
       delete [] _ptr;
    }

    string& operator=(const string& s)
    {
        const char* oldptr = _ptr;
        assign(s._ptr, s._len);
        delete [] oldptr;
    }

    const char* data()
    {
        return _ptr;
    }

    const char* c_str()
    {
       return _ptr;
    }

    size_t length()
    {
        return _len;
    }

    // substr always returns a new string
    std::string substr(size_t pos, size_t count)
    {
        std::string s(_ptr+pos, count);
        return s;  
    }

private:
    char* _ptr;
    size_t _len;

    void assign(const char* ptr, size_t len)
    {
        _len = len;        
        _ptr = new char[_len+1]; // +1 for null termination
        memcpy(_ptr, ptr, len); 
        _ptr[_len] = '[=10=]';       // always null terminate
    }
};

调用 string::c_str() 时实际做了什么？

What actually is done when `string::c_str()` is invoked?

c++

string

stl