linux 创建线程时进程内存增长

linux process memory growth when creating a thread

创建线程时,使用 pthread_create,报告的内存地址 space(通过 top 和 ps)根据以下信息显着增长:

线程的堆栈大小是明确设置的,所以很好,我可以看到它在 pmap 中弹出。

但我无法解释的是 65404 KB 命中率是多少? 这是 linux 内核映射还是什么?

还为线程设置了 detachstate 属性,即使它在 <1 秒内完成,内存映射仍然存在于 pmap 中。

这只是 linux 一般内存管理的一部分,一旦映射就可以重新使用吗? 65M的hit可以调一下吗,因为这是单线程的情况,当同时创建多个线程的时候,VSZ报告ramps up 非常快。 报告了 10 个线程,进程地址 space 中有 650M 膨胀。

...shared libs
...shared libs
2adf40000000 (132 KB)  rw-p (00:00 0)        <--- stack size for the thread.
2adf40021000 (65404 KB)  ---p (00:00 0)      <--- what is this? 
7ffcb8bed000 (128 KB)  rwxp (00:00 0)        [stack]
7ffcb8c0d000 (4 KB)    rw-p (00:00 0)        
7ffcb8dc6000 (8 KB)    r--p (00:00 0)        [vvar]
7ffcb8dc8000 (8 KB)    r-xp (00:00 0)        [vdso]
ffffffffff600000 (4 KB)  r-xp (00:00 0)      [vsyscall]
mapped:   116172 KB writable/private: 1140 KB shared: 0 KB

谢谢。

编辑:

所以我添加了第二个线程,pmap 现在显示:

2adf40000000 (132 KB)  rw-p (00:00 0)        
2adf40021000 (65404 KB)  ---p (00:00 0)      
2adf44000000 (132 KB)  rw-p (00:00 0)        
2adf44021000 (65404 KB)  ---p (00:00 0)      
7ffcb8bed000 (128 KB)  rwxp (00:00 0)        [stack]
7ffcb8c0d000 (4 KB)    rw-p (00:00 0)        
7ffcb8dc6000 (8 KB)    r--p (00:00 0)        [vvar]
7ffcb8dc8000 (8 KB)    r-xp (00:00 0)        [vdso]   
ffffffffff600000 (4 KB)  r-xp (00:00 0)      [vsyscall]
mapped:   181840 KB writable/private: 1400 KB shared: 0 KB

所以现在有 2 个堆栈和 65M 区域。 两者都增加了虚拟地址 space 也报告了。

编辑: 环境:glibc : ldd (Ubuntu EGLIBC 2.19-0ubuntu6.6) 2.19 内核为 4.4.103

找到答案here,是per thread arena,主要用于malloc的reduce锁

Threading: During early days of linux, dlmalloc was used as the default memory allocator. But later due to ptmalloc2’s threading support, it became the default memory allocator for linux. Threading support helps in improving memory allocator performance and hence application performance. In dlmalloc when two threads call malloc at the same time ONLY one thread can enter the critical section, since freelist data structure is shared among all the available threads. Hence memory allocation takes time in multi threaded applications, resulting in performance degradation. While in ptmalloc2, when two threads call malloc at the same time memory is allocated immediately since each thread maintains a separate heap segment and hence freelist data structures maintaining those heaps are also separate. This act of maintaining separate heap and freelist data structures for each thread is called per thread arena.