std::ofstream - 没有超过 1023 的缓冲字符串(即时刷新)
std::ofstream - no buffering string longer than 1023 (instant flush)
当我使用 pubsetbuf(...)
更改 ofstream
缓冲区的大小时,一切正常,除非我将 ofstream
单个字符串放入比 1023
更长的时间(在代码中以下)。这是正确的行为还是我做错了什么?
int main(){
std::vector<char> rawBuf;
std::ofstream stream;
rawBuf.resize(20000);
stream.rdbuf()->pubsetbuf(&rawBuf[0], 20000);
stream.open("file.txt", std::ios_base::app);
std::string data(1499, 'b');
for(int i = 0; i < 10; i++)
{
stream << data.substr(0, 1024) << "\n"; //1023-length string works great
sleep(1);
}
stream.flush();
stream.close();
return 0;
}
当有 1024 长度的字符串时 strace ./program
显示如下:
writev(3, [{iov_base=NULL, iov_len=0}, {iov_base="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., iov_len=1024}], 2) = 1024
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcf3889ac0) = 0
writev(3, [{iov_base="\n", iov_len=1}, {iov_base="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., iov_len=1024}], 2) = 1025
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcf3889ac0) = 0
... and so on 10x
当有 1023 长度的字符串时,一切似乎都正常:
nanosleep({tv_sec=1, tv_nsec=0}, 0x7fff8e13a980) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7fff8e13a980) = 0
... 10x
然后:
write(3, "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., 10240) = 10240
为什么这里是单写而前面不是?
编辑:
gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)
basic_streambuf* setbuf(char_type* s, streamsize n) override;
Effects: If setbuf(0, 0)
is called on a stream before any I/O has occurred on that stream, the stream becomes unbuffered. Otherwise the
results are implementation-defined. “Unbuffered” means that pbase()
and pptr()
always return null and output to the file should appear
as soon as possible.
“Implementation-defined”包括“工作正常”和“只有一个写入”和其他内容。事实上,这就是 libstdc++ 7.3.0 says:
First, are you sure that you understand buffering? Particularly the
fact that C++ may not, in fact, have anything to do with it?
The rules for buffering can be a little odd, but they aren't any
different from those of C. (Maybe that's why they can be a bit odd.)
Many people think that writing a newline to an output stream
automatically flushes the output buffer. This is true only when the
output stream is, in fact, a terminal and not a file or some other
device -- and that may not even be true since C++ says nothing about
files nor terminals. All of that is system-dependent. (The
"newline-buffer-flushing only occurring on terminals" thing is mostly
true on Unix systems, though.)
Some people also believe that sending endl down an output stream only
writes a newline. This is incorrect; after a newline is written, the
buffer is also flushed. Perhaps this is the effect you want when
writing to a screen -- get the text out as soon as possible, etc --
but the buffering is largely wasted when doing this to a file:
output << "a line of text" << endl;
output << some_data_variable << endl;
output << "another line of text" << endl;
The proper thing to do in this case to just write the data out and let
the libraries and the system worry about the buffering. If you need a
newline, just write a newline:
output << "a line of text\n"
<< some_data_variable << '\n'
<< "another line of text\n";
I have also joined the output statements into a single statement. You
could make the code prettier by moving the single newline to the start
of the quoted text on the last line, for example.
If you do need to flush the buffer above, you can send an endl
if
you also need a newline, or just flush the buffer yourself:
output << ...... << flush; // can use std::flush manipulator
output.flush(); // or call a member fn
On the other hand, there are times when writing to a file should be
like writing to standard error; no buffering should be done because
the data needs to appear quickly (a prime example is a log file for
security-related information). The way to do this is just to turn off
the buffering before any I/O operations at all have been done (note
that opening counts as an I/O operation):
std::ofstream os;
std::ifstream is;
int i;
os.rdbuf()->pubsetbuf(0,0);
is.rdbuf()->pubsetbuf(0,0);
os.open("/foo/bar/baz");
is.open("/qux/quux/quuux");
...
os << "this data is written immediately\n";
is >> i; // and this will probably cause a disk read
Since all aspects of buffering are handled by a streambuf
-derived
member, it is necessary to get at that member with rdbuf()
. Then the
public version of setbuf
can be called. The arguments are the same
as those for the Standard C I/O Library function (a buffer area
followed by its size).
A great deal of this is implementation-dependent. For example,
streambuf
does not specify any actions for its own setbuf()
-ish
functions; the classes derived from streambuf
each define behavior
that "makes sense" for that class: an argument of (0,0)
turns off
buffering for filebuf
but does nothing at all for its siblings
stringbuf
and strstreambuf
, and specifying anything other than
(0,0)
has varying effects. User-defined classes derived from
streambuf
can do whatever they want. (For filebuf
and arguments
for (p,s)
other than zeros, libstdc++ does what you'd expect: the
first s
bytes of p
are used as a buffer, which you must allocate
and deallocate.)
A last reminder: there are usually more buffers involved than just
those at the language/library level. Kernel buffers, disk buffers, and
the like will also have an effect. Inspecting and changing those are
system-dependent.
当我使用 pubsetbuf(...)
更改 ofstream
缓冲区的大小时,一切正常,除非我将 ofstream
单个字符串放入比 1023
更长的时间(在代码中以下)。这是正确的行为还是我做错了什么?
int main(){
std::vector<char> rawBuf;
std::ofstream stream;
rawBuf.resize(20000);
stream.rdbuf()->pubsetbuf(&rawBuf[0], 20000);
stream.open("file.txt", std::ios_base::app);
std::string data(1499, 'b');
for(int i = 0; i < 10; i++)
{
stream << data.substr(0, 1024) << "\n"; //1023-length string works great
sleep(1);
}
stream.flush();
stream.close();
return 0;
}
当有 1024 长度的字符串时 strace ./program
显示如下:
writev(3, [{iov_base=NULL, iov_len=0}, {iov_base="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., iov_len=1024}], 2) = 1024
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcf3889ac0) = 0
writev(3, [{iov_base="\n", iov_len=1}, {iov_base="bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., iov_len=1024}], 2) = 1025
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffcf3889ac0) = 0
... and so on 10x
当有 1023 长度的字符串时,一切似乎都正常:
nanosleep({tv_sec=1, tv_nsec=0}, 0x7fff8e13a980) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7fff8e13a980) = 0
... 10x
然后:
write(3, "bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb"..., 10240) = 10240
为什么这里是单写而前面不是?
编辑:
gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)
basic_streambuf* setbuf(char_type* s, streamsize n) override;
Effects: If
setbuf(0, 0)
is called on a stream before any I/O has occurred on that stream, the stream becomes unbuffered. Otherwise the results are implementation-defined. “Unbuffered” means thatpbase()
andpptr()
always return null and output to the file should appear as soon as possible.
“Implementation-defined”包括“工作正常”和“只有一个写入”和其他内容。事实上,这就是 libstdc++ 7.3.0 says:
First, are you sure that you understand buffering? Particularly the fact that C++ may not, in fact, have anything to do with it?
The rules for buffering can be a little odd, but they aren't any different from those of C. (Maybe that's why they can be a bit odd.) Many people think that writing a newline to an output stream automatically flushes the output buffer. This is true only when the output stream is, in fact, a terminal and not a file or some other device -- and that may not even be true since C++ says nothing about files nor terminals. All of that is system-dependent. (The "newline-buffer-flushing only occurring on terminals" thing is mostly true on Unix systems, though.)
Some people also believe that sending endl down an output stream only writes a newline. This is incorrect; after a newline is written, the buffer is also flushed. Perhaps this is the effect you want when writing to a screen -- get the text out as soon as possible, etc -- but the buffering is largely wasted when doing this to a file:
output << "a line of text" << endl; output << some_data_variable << endl; output << "another line of text" << endl;
The proper thing to do in this case to just write the data out and let the libraries and the system worry about the buffering. If you need a newline, just write a newline:
output << "a line of text\n" << some_data_variable << '\n' << "another line of text\n";
I have also joined the output statements into a single statement. You could make the code prettier by moving the single newline to the start of the quoted text on the last line, for example.
If you do need to flush the buffer above, you can send an
endl
if you also need a newline, or just flush the buffer yourself:output << ...... << flush; // can use std::flush manipulator output.flush(); // or call a member fn
On the other hand, there are times when writing to a file should be like writing to standard error; no buffering should be done because the data needs to appear quickly (a prime example is a log file for security-related information). The way to do this is just to turn off the buffering before any I/O operations at all have been done (note that opening counts as an I/O operation):
std::ofstream os; std::ifstream is; int i; os.rdbuf()->pubsetbuf(0,0); is.rdbuf()->pubsetbuf(0,0); os.open("/foo/bar/baz"); is.open("/qux/quux/quuux"); ... os << "this data is written immediately\n"; is >> i; // and this will probably cause a disk read
Since all aspects of buffering are handled by a
streambuf
-derived member, it is necessary to get at that member withrdbuf()
. Then the public version ofsetbuf
can be called. The arguments are the same as those for the Standard C I/O Library function (a buffer area followed by its size).A great deal of this is implementation-dependent. For example,
streambuf
does not specify any actions for its ownsetbuf()
-ish functions; the classes derived fromstreambuf
each define behavior that "makes sense" for that class: an argument of(0,0)
turns off buffering forfilebuf
but does nothing at all for its siblingsstringbuf
andstrstreambuf
, and specifying anything other than(0,0)
has varying effects. User-defined classes derived fromstreambuf
can do whatever they want. (Forfilebuf
and arguments for(p,s)
other than zeros, libstdc++ does what you'd expect: the firsts
bytes ofp
are used as a buffer, which you must allocate and deallocate.)A last reminder: there are usually more buffers involved than just those at the language/library level. Kernel buffers, disk buffers, and the like will also have an effect. Inspecting and changing those are system-dependent.