unix 中的写操作是原子的吗?

Is a write operation in unix atomic?

我在看APUE(UNIX环境下的高级程序设计),看到$3.11遇到这个问题:

if (lseek(fd, 0L, 2) < 0) /* position to EOF */
err_sys("lseek error");
if (write(fd, buf, 100) != 100) /* and write */
err_sys("write error")

APUE 说:

This works fine for a single process, but problems arise if multiple processes use this technique to append to the same file. .......The problem here is that our logical operation of ‘‘position to the end of file and write’’ requires two separate function calls (as we’ve shown it). Any operation that requires more than one function call cannot be atomic, as there is always the possibility that the kernel might temporarily suspend the process between the two function calls.

只是说cpu会在lseekwrite之间切换函数调用,我想知道它是否也会在write操作的一半之间切换?或者更确切地说,write 是原子的吗?如果threadA写"aaaaa",threadB写"bbbbb",结果会不会是"aabbbbbaaa"

更何况APUE说preadpwrite都是原子操作,那是不是说这些函数内部使用了mutexlock也是原子的?

在Linux中有阻塞和非阻塞系统调用。 write 是阻塞系统调用的一个例子,这意味着执行线程将被阻塞,直到 write 完成。所以一旦用户进程调用了write,在系统调用完成之前,它不能执行任何其他事情。因此,从用户线程的角度来看,它的行为类似于原子[尽管在内核级别可能会发生很多事情,并且系统调用的内核执行可能会被多次中断]。

将 Posix 语义称为“原子”可能过于简单化了。 Posix 要求读取和写入按某种顺序发生:

Writes can be serialized with respect to other reads and writes. If a read() of file data can be proven (by any means) to occur after a write() of the data, it must reflect that write(), even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data from write() calls to subsequent read() calls. (from the Rationale section of the Posix specification for pwrite and write)

APUE中提到的原子性保证是指使用O_APPEND标志,强制在文件末尾进行写入:

If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.

关于preadpwrite,APUE说(当然是正确的)这些接口允许应用程序自动查找和执行I/O;换句话说,I/O 操作将在指定的文件位置发生,而不管任何其他进程做什么。 (因为位置是在调用本身指定的,不会影响持久文件位置。)

Posix顺序保证如下(来自write()pwrite()函数的描述):

After a write() to a regular file has successfully returned:

  • Any successful read() from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.

  • Any subsequent successful write() to the same byte position in the file shall overwrite that file data.

如基本原理中所述,此措辞确实保证两个同时的 write 调用(即使在不同的不相关进程中)不会交错数据,因为如果数据在写入过程中交错,最终将在第二次写入之后成功保证将无法提供。如何实现取决于实施。

需要注意的是,并不是所有的文件系统都符合Posix,模块化的OS设计,允许多个文件系统在一个安装中共存,使得内核本身无法提供关于 write 的保证适用于所有可用的文件系统。网络文件系统特别容易出现数据竞争(本地互斥量也无济于事),Posix(在引用自基本原理的段落末尾)也提到了这一点:

This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics.

第一个保证(关于后续读取)需要在文件系统中进行一些簿记,因为已经成功“写入”内核缓冲区但尚未同步到磁盘的数据必须透明地可供从该文件读取的进程使用.这也需要对内核元数据进行一些内部锁定。

由于写入常规文件通常是通过内核缓冲区完成的,而实际上将数据同步到物理存储设备肯定是不是原子的,提供这些保证所必需的锁不会不必非常持久。但它们必须在文件系统内部完成,因为 Posix 措辞中没有任何内容限制对单线程进程中同时写入的保证。

在多线程进程中,Posix 确实要求 read()write()pread()pwrite() 在对常规文件进行操作时是原子的(或符号链接)。有关必须遵守此要求的接口的完整列表,请参阅 Thread Interactions with Regular File Operations