当另一个程序打开文件时自动刷新文件缓冲区仅当其他程序尚未运行 (macOS)

Question

我运行在 macOS 上遇到了一个奇怪的行为，我开始调查是由于告诉 OS 打开一个 RTF 文件时会发生的意外竞争条件我刚刚处理过，但没有关闭或显式刷新。

如果文件处理程序（Word、TextEdit 等）尚未打开，则调用 system("open test.rtf") 将打开文件就好了，文件就完整了。

但是，如果文件处理程序已经打开，则调用 system("open test.rtf") 会导致错误消息文件已损坏或 t运行cated（因为缓冲区未完全刷新）。

最明显的修复是 fflush() and/or fclose() 我的文件，然后再打开它。然而，我更感兴趣的是我程序的运行time 和 macOS 之间的潜在交互。我的问题是：文件处理程序的 running/not-running 状态如何以及为什么影响我的缓冲区是否被刷新？

（这不仅仅是打开程序所需时间的问题——我在没有明确刷新缓冲区的版本中添加了睡眠延迟，这没有任何区别。）

未刷新版本（仅当文件处理程序尚未运行ning 时才有效）：

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>

int main(void) {
    FILE *fin  = fopen("src.rtf", "rb");
    FILE *fout = fopen("test.rtf", "wb");
    int c;

    assert(fin && fout);
    while ((c=fgetc(fin)) != EOF) fputc(c, fout);

    sleep(3);
    system("open test.rtf");

    fclose(fin);
    fclose(fout);
    
    return 0;
}

显式刷新版本（一直有效）：

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>

int main(void) {
    FILE *fin  = fopen("src.rtf", "rb");
    FILE *fout = fopen("test.rtf", "wb");
    int c;

    assert(fin && fout);
    while ((c=fgetc(fin)) != EOF) fputc(c, fout);

    fflush(fout);
    
    system("open test.rtf");

    fclose(fin);
    fclose(fout);
    
    return 0;
}

我正在使用的示例 RTF 文件位于：https://pastebin.com/mXLk85G1

Answer 1

好的，所以，我的最佳猜测是这样的：

system()→fork()→exec()→/bin/sh

/bin/sh 处理其参数并分派命令：

fork()→exec()→open

/usr/bin/open 处理其参数，通过 LaunchServices 查找文件处理程序，并尝试使用附属程序打开文件。

纯属猜测：

已经运行：/usr/bin/open使用IPC告诉已经-运行的应用程序尝试打开文件。应用程序在其现有文件描述符 table 中打开一个新的文件描述符并将其读入，获取截断版本，因为原始流尚未 fflush()'d 或 fclose()' .
还没有运行：/usr/bin/open 发现应用程序还没有运行，并通过fork()→exec()。这意味着应用程序 still 具有来自原始程序的原始 fd table。 Cocoa 运行时检查 fd table，发现它已经打开进行写入，并在重新打开进行读取之前将其关闭，导致输出缓冲区被刷新。

我已验证关闭 fork() 的子文件中的文件将导致输出缓冲区被刷新。以下将始终如一地工作：

#include <stdio.h>
#include <stdlib.h>
#include <assert.h>
#include <unistd.h>

int main(void) {
    FILE *fin  = fopen("src.rtf", "rb");
    FILE *fout = fopen("test.rtf", "wb");
    int c;
    int pid, wpid, wstat; 

    assert(fin && fout);
    while ((c=fgetc(fin)) != EOF) fputc(c, fout);

    pid = fork();
    if (!pid) {
        fclose(fout);
        fout = NULL;
    } else {
        wpid = waitpid(pid, &wstat, WUNTRACED);
        execl("/bin/sh", "sh", "-c", "open test.rtf", (char *)0);
    }

    if (fin)  fclose(fin); 
    if (fout) fclose(fout); 
    
    return 0;
}

然而，我的理论存在一个主要问题：虽然 fd table 可能会跨越 fork() 和 exec() 所有一整天，进程内存，包括 FILE*，将在 exec() 覆盖图像时被销毁。

进一步研究后，我发现打开 Cocoa 文件处理程序会导致 launchd 启动 /sbin/filecoordinationd，这是一个“协调文件访问”的守护程序。 https://www.unix.com/man-page/osx/8/filecoordinationd/。而且，事实上，TextEdit 注册为 NSFilePresenterProxy。 macOS 有一个完整的底层文件访问机制来监视文件更改并确保不同进程访问的文件保持良好状态。这是有道理的，一旦 TextEdit 注册为 NSFilePresenter，调用 /sbin/filecoordinationd，守护进程将确保它知道已经打开的任何缓冲区都将进入良好状态。

但是我的程序如何做到这一点，它不使用 Cocoa 也没有注册为 NSFile-anything？最可能的答案是 NS-类的文件协调机制是在 libSystem.dylib 中实现的，它也用作系统 C 库。 macOS 系统 C 库可能内置了让操作系统刷新进程运行时缓冲区的功能。

那么为什么当 Cocoa 应用程序已经运行时它不这样做呢？它可能不知道它应该。如果 TextEdit 没有打开文件，并且打开文件的进程没有向 NSFile...、 注册，并且 TextEdit 没有继承文件描述符 table, Cocoa 生态系统中的任何东西都不会知道它应该告诉 /sbin/filecoordinationd 以确保刷新文件的输出缓冲区。

这似乎是最有效的理论，而且在没有 macOS 工程师输入或访问 /usr/bin/open 和 /sbin/filecoordinationd 源代码的情况下，我将得到答案。

当另一个程序打开文件时自动刷新文件缓冲区仅当其他程序尚未运行 (macOS)

Automatic file buffer flushing when file opened by another program only if other program not already running (macOS)

c

filesystems

macos

fflush

当另一个程序打开文件时自动刷新文件缓冲区*仅当*其他程序尚未 运行 (macOS)

Automatic file buffer flushing when file opened by another program *only if* other program not already running (macOS)

c

filesystems

macos

fflush

当另一个程序打开文件时自动刷新文件缓冲区仅当其他程序尚未运行 (macOS)

Automatic file buffer flushing when file opened by another program only if other program not already running (macOS)