如何减少 gccgo 编译的可执行文件所需的虚拟内存?

How can I reduce the virtual memory required by gccgo compiled executable?

当我使用 gccgo 编译这个简单的 hello world 示例时,生成的可执行文件使用了超过 800 MiB 的 VmData。我想知道为什么,如果有什么我可以做的来降低它。休眠只是为了让我有时间观察内存使用情况。

来源:

package main

import (
  "fmt"
  "time"
)

func main() {
  fmt.Println("hello world")
  time.Sleep(1000000000 * 5)
}

我用来编译的脚本:

#!/bin/bash

TOOLCHAIN_PREFIX=i686-linux-gnu
OPTIMIZATION_FLAG="-O3"

CGO_ENABLED=1 \
CC=${TOOLCHAIN_PREFIX}-gcc-8 \
CXX=${TOOLCHAIN_PREFIX}-g++-8 \
AR=${TOOLCHAIN_PREFIX}-ar \
GCCGO=${TOOLCHAIN_PREFIX}-gccgo-8 \
CGO_CFLAGS="-g ${OPTIMIZATION_FLAG}" \
CGO_CPPFLAGS="" \
CGO_CXXFLAGS="-g ${OPTIMIZATION_FLAG}" \
CGO_FFLAGS="-g ${OPTIMIZATION_FLAG}" \
CGO_LDFLAGS="-g ${OPTIMIZATION_FLAG}" \
GOOS=linux \
GOARCH=386 \
go build -x \
   -compiler=gccgo \
   -gccgoflags=all="-static -g ${OPTIMIZATION_FLAG}" \
   

gccgo 版本:

$ i686-linux-gnu-gccgo-8 --version
i686-linux-gnu-gccgo-8 (Ubuntu 8.2.0-1ubuntu2~18.04) 8.2.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

/proc//status 的输出:

VmPeak:  811692 kB
VmSize:  811692 kB
VmLck:        0 kB
VmPin:        0 kB
VmHWM:     5796 kB
VmRSS:     5796 kB
VmData:  807196 kB
VmStk:      132 kB
VmExe:     2936 kB
VmLib:        0 kB
VmPTE:       52 kB
VmPMD:        0 kB
VmSwap:       0 kB

我问是因为我的设备只有 512 MiB 的 RAM。我知道这是虚拟内存,但我想尽可能减少或删除过度使用。一个简单的可执行文件需要那么多分配对我来说似乎不合理。

可能的原因是您正在 link 将库插入代码中。我的猜测是,如果您要显式 link 到静态库,您将能够获得更小的逻辑地址 space,以便将最少的逻辑地址添加到您的可执行文件中。无论如何,拥有大逻辑地址 space.

的危害最小

我能够找到 gccgo 要求这么多内存的位置。在mallocinit函数的libgo/go/runtime/malloc.go文件中:

// If we fail to allocate, try again with a smaller arena.
// This is necessary on Android L where we share a process
// with ART, which reserves virtual memory aggressively.
// In the worst case, fall back to a 0-sized initial arena,
// in the hope that subsequent reservations will succeed.
arenaSizes := [...]uintptr{
  512 << 20,
  256 << 20,
  128 << 20,
  0,
}

for _, arenaSize := range &arenaSizes {
  // SysReserve treats the address we ask for, end, as a hint,
  // not as an absolute requirement. If we ask for the end
  // of the data segment but the operating system requires
  // a little more space before we can start allocating, it will
  // give out a slightly higher pointer. Except QEMU, which
  // is buggy, as usual: it won't adjust the pointer upward.
  // So adjust it upward a little bit ourselves: 1/4 MB to get
  // away from the running binary image and then round up
  // to a MB boundary.
  p = round(getEnd()+(1<<18), 1<<20)
  pSize = bitmapSize + spansSize + arenaSize + _PageSize
  if p <= procBrk && procBrk < p+pSize {
    // Move the start above the brk,
    // leaving some room for future brk
    // expansion.
    p = round(procBrk+(1<<20), 1<<20)
  }
  p = uintptr(sysReserve(unsafe.Pointer(p), pSize, &reserved))
  if p != 0 {
    break
  }
}
if p == 0 {
  throw("runtime: cannot reserve arena virtual address space")
}

有趣的是,如果较大的竞技场失败,它会退回到较小的竞技场。因此,限制 go 可执行文件可用的虚拟内存实际上会限制它成功分配的数量。

我能够使用 ulimit -v 327680 将虚拟内存限制为较小的数字:

VmPeak:   300772 kB
VmSize:   300772 kB
VmLck:         0 kB
VmPin:         0 kB
VmHWM:      5712 kB
VmRSS:      5712 kB
VmData:   296276 kB
VmStk:       132 kB
VmExe:      2936 kB
VmLib:         0 kB
VmPTE:        56 kB
VmPMD:         0 kB
VmSwap:        0 kB

这些仍然是很大的数字,但是 gccgo 可执行文件可以达到的最好结果。所以问题的答案是,是的,你可以减少 gccgo 编译的可执行文件的 VmData,但你真的不应该为此担心。 (在 64 位机器上 gccgo 尝试分配 512 GB。)