std::ofstream - 没有超过 1023 的缓冲字符串（即时刷新）

Question

当我使用 pubsetbuf(...) 更改 ofstream 缓冲区的大小时，一切正常，除非我将 ofstream 单个字符串放入比 1023 更长的时间（在代码中以下）。这是正确的行为还是我做错了什么？

int main(){
    std::vector<char> rawBuf;
    std::ofstream stream;

    rawBuf.resize(20000);
    stream.rdbuf()->pubsetbuf(&rawBuf[0], 20000);

    stream.open("file.txt", std::ios_base::app);

    std::string data(1499, 'b');

    for(int i = 0; i < 10; i++)
    {   
        stream << data.substr(0, 1024) << "\n"; //1023-length string works great
        sleep(1);
    }
    stream.flush();
    stream.close();

    return 0;
}

当有 1024 长度的字符串时 strace ./program 显示如下：

writev(3, [{iov_base=NULL, iov_len=0}, {iov_base="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., iov_len=1024}], 2) = 1024
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcf3889ac0) = 0
writev(3, [{iov_base="\n", iov_len=1}, {iov_base="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., iov_len=1024}], 2) = 1025
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcf3889ac0) = 0
... and so on 10x

当有 1023 长度的字符串时，一切似乎都正常:

nanosleep({tv_sec=1, tv_nsec=0}, 0x7fff8e13a980) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7fff8e13a980) = 0
... 10x

然后：

write(3, "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., 10240) = 10240

为什么这里是单写而前面不是？

编辑：

gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)

Answer 1

根据 [filebuf.virtuals]/12:

basic_streambuf* setbuf(char_type* s, streamsize n) override;
Effects: If setbuf(0, 0) is called on a stream before any I/O has occurred on that stream, the stream becomes unbuffered. Otherwise the results are implementation-defined. “Unbuffered” means that pbase() and pptr() always return null and output to the file should appear as soon as possible.

“Implementation-defined”包括“工作正常”和“只有一个写入”和其他内容。事实上，这就是 libstdc++ 7.3.0 says:

First, are you sure that you understand buffering? Particularly the fact that C++ may not, in fact, have anything to do with it?

The rules for buffering can be a little odd, but they aren't any different from those of C. (Maybe that's why they can be a bit odd.) Many people think that writing a newline to an output stream automatically flushes the output buffer. This is true only when the output stream is, in fact, a terminal and not a file or some other device -- and that may not even be true since C++ says nothing about files nor terminals. All of that is system-dependent. (The "newline-buffer-flushing only occurring on terminals" thing is mostly true on Unix systems, though.)

Some people also believe that sending endl down an output stream only writes a newline. This is incorrect; after a newline is written, the buffer is also flushed. Perhaps this is the effect you want when writing to a screen -- get the text out as soon as possible, etc -- but the buffering is largely wasted when doing this to a file:
output << "a line of text" << endl;
output << some_data_variable << endl;
output << "another line of text" << endl; 
The proper thing to do in this case to just write the data out and let the libraries and the system worry about the buffering. If you need a newline, just write a newline:
output << "a line of text\n"
 << some_data_variable << '\n'
 << "another line of text\n"; 
I have also joined the output statements into a single statement. You could make the code prettier by moving the single newline to the start of the quoted text on the last line, for example.

If you do need to flush the buffer above, you can send an endl if you also need a newline, or just flush the buffer yourself:
output << ...... << flush;    // can use std::flush manipulator
output.flush();               // or call a member fn 
On the other hand, there are times when writing to a file should be like writing to standard error; no buffering should be done because the data needs to appear quickly (a prime example is a log file for security-related information). The way to do this is just to turn off the buffering before any I/O operations at all have been done (note that opening counts as an I/O operation):
std::ofstream    os;
std::ifstream    is;
int   i;

os.rdbuf()->pubsetbuf(0,0);
is.rdbuf()->pubsetbuf(0,0);

os.open("/foo/bar/baz");
is.open("/qux/quux/quuux");
...
os << "this data is written immediately\n";
is >> i;   // and this will probably cause a disk read 
Since all aspects of buffering are handled by a streambuf-derived member, it is necessary to get at that member with rdbuf(). Then the public version of setbuf can be called. The arguments are the same as those for the Standard C I/O Library function (a buffer area followed by its size).

A great deal of this is implementation-dependent. For example, streambuf does not specify any actions for its own setbuf()-ish functions; the classes derived from streambuf each define behavior that "makes sense" for that class: an argument of (0,0) turns off buffering for filebuf but does nothing at all for its siblings stringbuf and strstreambuf, and specifying anything other than (0,0) has varying effects. User-defined classes derived from streambuf can do whatever they want. (For filebuf and arguments for (p,s) other than zeros, libstdc++ does what you'd expect: the first s bytes of p are used as a buffer, which you must allocate and deallocate.)

A last reminder: there are usually more buffers involved than just those at the language/library level. Kernel buffers, disk buffers, and the like will also have an effect. Inspecting and changing those are system-dependent.

std::ofstream - 没有超过 1023 的缓冲字符串（即时刷新）

std::ofstream - no buffering string longer than 1023 (instant flush)

c++

buffer

fstream

ofstream