为什么 Intel Pin 无法检测 open 系统调用?

Why is Intel Pin not able to instrument open syscall?

我正在尝试构建一个 pintool,它应该能够检测针对特定 file/directory 的 open() 系统调用,并将文件路径参数替换为另一个值。

例如,这是我要检测的非常简单的代码:

    #include <iostream>
    #include <sys/types.h>
    #include <sys/stat.h>
    #include <fcntl.h>
    
    using namespace std;
    
    int main(int argc, char **argv)
    {
        int i = open("/home/preet_derasari/important.txt", O_RDONLY);
        cout << "fid: " << i << endl;
    }

在此示例中,我希望 Pin 将文件路径从 /home/preet_derasari/important.txt 更改为 /home/preet_derasari/dummy.txt。 为了做到这一点,我在参考了一些示例 pintools 和 Pin API 之后编写了一个非常简单的 pintool:

    #include "pin.H"
    #include <iostream>
    #include <fstream>
    #include <syscall.h>
    #include <string>
    using namespace std;
    
    INT32 Usage()
    {
        cout << "This tool prints out the number of dynamically executed " << endl
             << "instructions, basic blocks and threads in the application." << endl
             << endl;
    
        cout << KNOB_BASE::StringKnobSummary() << endl;
    
        return -1;
    }
    
    void SyscallEntry(THREADID threadIndex, CONTEXT *ctxt, SYSCALL_STANDARD std, void *v)
    {
        ADDRINT sysNum = PIN_GetSyscallNumber(ctxt, std);
        cout << "entered syscall: " << sysNum << endl;
        if(sysNum == SYS_open)
        {
            cout << "open encountered!" << endl;
            char *path = (char *)PIN_GetSyscallArgument(ctxt, std, 0);
            cout << "Original File Path: " << path << endl;
            int match = strcmp((char *)PIN_GetSyscallArgument(ctxt, std, 0), "/home/preet_derasari/important.txt");
            if(!match)
            {
                string pathDummy = "/home/preet_derasari/dummy.txt";
                PIN_SetSyscallArgument (ctxt, std, 0, (ADDRINT) pathDummy.c_str());
                cout << "Dummy File Path: " << pathDummy << endl;
            }
        }
    }
    
    int main(int argc, char* argv[])
    {
        cout << "Open Syscall Value: " << SYS_open << endl;
    
        if (PIN_Init(argc, argv))
        {
            return Usage();
        }
    
        cout << "===============================================" << endl;
        cout << "This application is instrumented by MyPinTool" << endl;
        cout << "===============================================" << endl;
    
        PIN_AddSyscallEntryFunction(SyscallEntry, 0);
    
        // Start the program, never returns
        PIN_StartProgram();
    
        return 0;
    }

I 运行 pintool 使用此命令:../../../pin -t obj-intel64/MY_pin.so -- test 其中 MY_pin.so 是 pintool 共享对象库,test 是上面提到的示例代码。

输出让我感到困惑,因为 Pin 正在检测所有系统调用 except open:

    Open Syscall Value: 2
    ===============================================
    This application is instrumented by MyPinTool
    ===============================================
    entered syscall: 12
    entered syscall: 158
    entered syscall: 21
    entered syscall: 257
    entered syscall: 5
    entered syscall: 9
    entered syscall: 3
    entered syscall: 257
    entered syscall: 0
    entered syscall: 17
    entered syscall: 17
    entered syscall: 17
    entered syscall: 5
    entered syscall: 9
    entered syscall: 17
    entered syscall: 17
    entered syscall: 17
    entered syscall: 9
    entered syscall: 9
    entered syscall: 9
    entered syscall: 9
    entered syscall: 9
    entered syscall: 3
    entered syscall: 158
    entered syscall: 10
    entered syscall: 10
    entered syscall: 10
    entered syscall: 11
    entered syscall: 12
    entered syscall: 12
    entered syscall: 257
    entered syscall: 5
    entered syscall: 9
    entered syscall: 3
    entered syscall: 3

正如您所看到的,除了 open 之外的所有系统调用都使用 pin 工具,即系统调用编号 2(基于 x86_64 ISA)。

一个有趣的观察是该程序没有从我的测试程序 (cout << "fid: " << i << endl;) 中输出 cout,这让我怀疑 Pin 是否对打开的系统调用做了一些奇怪的事情?

规格:

有人可以帮我理解为什么会这样吗?

strace cat foo 表明程序不再使用旧的 open(2) 系统调用:

...
openat(AT_FDCWD, "foo", O_RDONLY)       = 3
...

__NR_openat 是 257,您的 PIN 工具观察到了 3 次。显然,甚至 open() libc 包装函数在内部也使用 openat Linux 系统调用。 (__NR_open = 2 系统调用仍然有效;内核也有代码将其 args 传递给当前实现。IDK 效率更高,就像它可能只是设置一个 AT_FDCWD arg 并调用 sys_openat() 必须再次解码,就像 glibc 在 user-space 中所做的那样。)


The open(2) man page also documents openat(2).

The dirfd argument is used in conjunction with the pathname argument as follows:

  • If the pathname given in pathname is absolute, then dirfd is ignored.

  • If the pathname given in pathname is relative and dirfd is the special value AT_FDCWD, then pathname is interpreted relative to the current working directory of the calling process (like open()).

  • ...

openat / linkat 等等,当与 open(O_DIRECTORY) 中的 fd 一起使用时,让像 find 这样的程序避免 TOCTOU 竞赛,and/or 让 multi-threaded 程序避免实际上必须 chdir (因为每个进程只有一个 CWD,而不是每个线程。)

将它们与 AT_FDCWD 一起使用与 old-style open(2).

没有优势或劣势