/proc/$pid/maps 中的 "deleted" 是什么意思?

What "deleted" means in /proc/$pid/maps?

我下载了libhugetlbfs.so并且有一个简单的测试源:

int glbarr[1024*1024]={0} ;
int main()
{
    char * ptr ;
    ptr = (char *) malloc( 1024 * 1024 * 1 ) ;
    printf(" press any key to go on \n");
    getchar() ;
    for(int idx=0;idx<100;idx++){
        char strtmp[64] = {0} ;
        sprintf(strtmp,"%020d",idx) ;
        strcpy( ptr+1024*idx , strtmp ) ;
    } //for 
    for(int idx=0;idx<100;idx++){
        glbarr[idx] = idx ;
    }
    printf(" press any key to go on \n");
    getchar() ;
} // main

然后设置环境:

export LD_PRELOAD=libhugetlbfs.so
export HUGETLB_MORECORE=yes
export HUGETLB_DEBUG=1

最后,执行test_malloc.exe :

 INFO: Found pagesize 2048 kB
 INFO: Detected page sizes:
 INFO:    Size: 2048 kB (default)  Mount: /mnt/SharedMem_2M
 INFO: Parsed kernel version: [3] . [10] . [0] 
 INFO: Feature private_reservations is present in this kernel
 INFO: Feature noreserve_safe is present in this kernel
 INFO: Feature map_hugetlb is present in this kernel
 INFO: Kernel has MAP_PRIVATE reservations.  Disabling heap prefaulting.
 INFO: Kernel supports MAP_HUGETLB
 INFO: HUGETLB_SHARE=0, sharing disabled
 INFO: HUGETLB_NO_RESERVE=no, reservations enabled
 INFO: Segment 0 (phdr 2): 0x400000-0x400a04  (filesz=0xa04) (prot = 0x5)
 INFO: Segment 1 (phdr 3): 0x600de0-0xa01080  (filesz=0x274) (prot = 0x3)
 DEBUG: symbol to copy at 0x601060: stdin
 DEBUG: Total memsz = 0x400ca4, memsz of largest segment = 0x4002a0
 INFO: libhugetlbfs version: 2.20
 INFO: Mapped hugeseg at 0x2aaaaac00000. Copying 0xa04 bytes and 0 extra bytes from 0x400000...done
 INFO: Prepare succeeded
 INFO: Mapped hugeseg at 0x2aaaaac00000. Copying 0x274 bytes and 0x14 extra bytes from 0x600de0...done
 INFO: Prepare succeeded
 INFO: setup_morecore(): heapaddr = 0x1c00000
 INFO: hugetlbfs_morecore(2101248) = ...
 INFO: heapbase = 0x1c00000, heaptop = 0x1c00000, mapsize = 0, delta=2101248
 INFO: Attempting to map 4194304 bytes
 INFO: ... = 0x1c00000
 INFO: hugetlbfs_morecore(0) = ...
 INFO: heapbase = 0x1c00000, heaptop = 0x1e01000, mapsize = 400000, delta=-2093056
 INFO: ... = 0x1e01000

和 /proc/pidof test_malloc.exe/maps :

00400000-00600000 r-xp 00000000 00:2b 6019488                            /mnt/SharedMem_2M/libhugetlbfs.tmp.uI55WD (deleted)
00600000-00c00000 rw-p 00000000 00:2b 6123885                            /mnt/SharedMem_2M/libhugetlbfs.tmp.VUALYM (deleted)
01c00000-02000000 rw-p 00000000 00:0d 6123886                            /anon_hugepage (deleted)

numastat -m 显示真正使用了 8M 的大页面, 困扰我的是 "deleted" 在地图输出中意味着什么?! 在 /mnt/SharedMem_2M 或 /anon_hugepage

编辑:

和调试信息:

INFO: Found pagesize 2048 kB
INFO: Detected page sizes:
INFO:    Size: 2048 kB (default)  Mount: /mnt/SharedMem_2M
INFO: Parsed kernel version: [3] . [10] . [0] 
INFO: Feature private_reservations is present in this kernel
INFO: Feature noreserve_safe is present in this kernel
INFO: Feature map_hugetlb is present in this kernel
INFO: Kernel has MAP_PRIVATE reservations.  Disabling heap prefaulting.
INFO: Kernel supports MAP_HUGETLB
INFO: HUGETLB_SHARE=0, sharing disabled
INFO: HUGETLB_NO_RESERVE=no, reservations enabled
INFO: Segment 0 (phdr 3): 0x600de0-0xa01080  (filesz=0x274) (prot = 0x3)
DEBUG: symbol to copy at 0x601060: stdin
DEBUG: Total memsz = 0x4002a0, memsz of largest segment = 0x4002a0
INFO: libhugetlbfs version: 2.20
INFO: Mapped hugeseg at 0x2aaaaac00000. Copying 0x274 bytes and 0x14 extra bytes from 0x600de0...done
INFO: Prepare succeeded
INFO: setup_morecore(): heapaddr = 0x2200000
INFO: hugetlbfs_morecore(2101248) = ...
INFO: heapbase = 0x2200000, heaptop = 0x2200000, mapsize = 0, delta=2101248
INFO: Attempting to map 4194304 bytes
INFO: ... = 0x2200000
INFO: hugetlbfs_morecore(0) = ...
INFO: heapbase = 0x2200000, heaptop = 0x2401000, mapsize = 400000, delta=-2093056
INFO: ... = 0x2401000

和地图:

00400000 default file=/home/marschen/test/posix-memalign/test_malloc.exe mapped=1 N0=1 kernelpagesize_kB=4
00600000 default file=/mnt/SharedMem_2M/libhugetlbfs.tmp.85Y41e0(deleted) huge anon=1 dirty=1 N0=1 kernelpagesize_kB=2048
02200000 default file=/anon_hugepage0(deleted) huge anon=1 dirty=1 N0=1 kernelpagesize_kB=2048

当 libhugetlbfs 使用 hugetlb fs 伪文件系统 (grep hugetlbfs /proc/filesystems) 来获取 hugetlb 页面支持的 mmap 时,临时文件删除(取消链接)是正常的。

比如有libhugetlbfs/hugeutils.c的hugetlbfs_unlinked_fd函数 https://github.com/libhugetlbfs/libhugetlbfs/blob/e44180072b796c0e28e53c4d01ef6279caaa2a99/hugeutils.c#L1033

int hugetlbfs_unlinked_fd_for_size(long page_size)
{
    const char *path;
    char name[PATH_MAX+1];
    int fd;

    path = hugetlbfs_find_path_for_size(page_size);
    ..
    name[sizeof(name)-1] = '[=10=]';

    strcpy(name, path);
    strncat(name, "/libhugetlbfs.tmp.XXXXXX", sizeof(name)-1);
    /* FIXME: deal with overflows */

    fd = mkstemp64(name);
    ....

    unlink(name);

    return fd;
}

临时文件名是在mkstemp函数中随机生成的;它还会创建文件并打开它。然后这个文件从文件系统中被取消链接(man 2 unlink)(文件名在目录中被标记为已删除,inode和文件数据仍然存在,但其他程序不能通过名称访问这个文件)。

打开unlinked fd时,它可以用来与hugetlb mmap一起工作并存储数据。只有关闭这个fd,文件数据才会被fs真正删除。

经常使用 mktemp 文件的提前取消链接:

一些有用的信息也可以在libhugetlbfs项目的HOWTO中列出 https://github.com/libhugetlbfs/libhugetlbfs/blob/master/HOWTO