为什么 Go 在 Windows 上使用 cgo 来实现简单的 File.Write?

Why Go use cgo on Windows for a simple File.Write?

将一个简单的程序从 C# 重写为 Go,我发现生成的可执行文件慢了 3 到 4 倍。特别是 Go 版本使用了 3 到 4 倍 CPU。这是令人惊讶的,因为代码做了很多 I/O 并且不应该消耗大量的 CPU.

我做了一个非常简单的版本,只做顺序写入,并做了基准测试。我 运行 在 Windows 10 和 Linux (Debian Jessie) 上进行相同的基准测试。时间无法比较(不是同一个系统,磁盘,...)但是结果很有趣。

我在两个平台上使用相同的 Go 版本:1.6

在 Windows os.File.Write 上使用 cgo(请参阅下面的 runtime.cgocall),而不是在 Linux 上。为什么?

这是 disk.go 程序:

    package main

    import (
        "crypto/rand"
        "fmt"
        "os"
        "time"
    )

    const (
        // size of the test file
        fullSize = 268435456
        // size of read/write per call
        partSize = 128
        // path of temporary test file
        filePath = "./bigfile.tmp"
    )

    func main() {
        buffer := make([]byte, partSize)

        seqWrite := func() error {
            return sequentialWrite(filePath, fullSize, buffer)
        }

        err := fillBuffer(buffer)
        panicIfError(err)
        duration, err := durationOf(seqWrite)
        panicIfError(err)
        fmt.Printf("Duration : %v\n", duration)
    }

    // It's just a test ;)
    func panicIfError(err error) {
        if err != nil {
            panic(err)
        }
    }

    func durationOf(f func() error) (time.Duration, error) {
        startTime := time.Now()
        err := f()
        return time.Since(startTime), err
    }

    func fillBuffer(buffer []byte) error {
        _, err := rand.Read(buffer)
        return err
    }

    func sequentialWrite(filePath string, fullSize int, buffer []byte) error {
        desc, err := os.OpenFile(filePath, os.O_WRONLY|os.O_CREATE, 0666)
        if err != nil {
            return err
        }
        defer func() {
            desc.Close()
            err := os.Remove(filePath)
            panicIfError(err)
        }()

        var totalWrote int
        for totalWrote < fullSize {
            wrote, err := desc.Write(buffer)
            totalWrote += wrote
            if err != nil {
                return err
            }
        }

        return nil
    }

基准测试(disk_test.go):

    package main

    import (
        "testing"
    )

    // go test -bench SequentialWrite -cpuprofile=cpu.out
    // Windows : go tool pprof -text -nodecount=10 ./disk.test.exe cpu.out
    // Linux : go tool pprof -text -nodecount=10 ./disk.test cpu.out
    func BenchmarkSequentialWrite(t *testing.B) {
        buffer := make([]byte, partSize)
        err := sequentialWrite(filePath, fullSize, buffer)
        panicIfError(err)
    }

Windows 结果(带 cgo):

    11.68s of 11.95s total (97.74%)
    Dropped 18 nodes (cum <= 0.06s)
    Showing top 10 nodes out of 26 (cum >= 0.09s)
          flat  flat%   sum%        cum   cum%
        11.08s 92.72% 92.72%     11.20s 93.72%  runtime.cgocall
         0.11s  0.92% 93.64%      0.11s  0.92%  runtime.deferreturn
         0.09s  0.75% 94.39%     11.45s 95.82%  os.(*File).write
         0.08s  0.67% 95.06%      0.16s  1.34%  runtime.deferproc.func1
         0.07s  0.59% 95.65%      0.07s  0.59%  runtime.newdefer
         0.06s   0.5% 96.15%      0.28s  2.34%  runtime.systemstack
         0.06s   0.5% 96.65%     11.25s 94.14%  syscall.Write
         0.05s  0.42% 97.07%      0.07s  0.59%  runtime.deferproc
         0.04s  0.33% 97.41%     11.49s 96.15%  os.(*File).Write
         0.04s  0.33% 97.74%      0.09s  0.75%  syscall.(*LazyProc).Find

Linux 结果(没有 cgo):

    5.04s of 5.10s total (98.82%)
    Dropped 5 nodes (cum <= 0.03s)
    Showing top 10 nodes out of 19 (cum >= 0.06s)
          flat  flat%   sum%        cum   cum%
         4.62s 90.59% 90.59%      4.87s 95.49%  syscall.Syscall
         0.09s  1.76% 92.35%      0.09s  1.76%  runtime/internal/atomic.Cas
         0.08s  1.57% 93.92%      0.19s  3.73%  runtime.exitsyscall
         0.06s  1.18% 95.10%      4.98s 97.65%  os.(*File).write
         0.04s  0.78% 95.88%      5.10s   100%  _/home/sam/Provisoire/go-disk.sequentialWrite
         0.04s  0.78% 96.67%      5.05s 99.02%  os.(*File).Write
         0.04s  0.78% 97.45%      0.04s  0.78%  runtime.memclr
         0.03s  0.59% 98.04%      0.08s  1.57%  runtime.exitsyscallfast
         0.02s  0.39% 98.43%      0.03s  0.59%  os.epipecheck
         0.02s  0.39% 98.82%      0.06s  1.18%  runtime.casgstatus

Go不执行文件I/O,它将任务委托给操作系统。请参阅依赖于 Go 操作系统的 syscall 包。

Linux 和 Windows 是具有不同 OS ABI 的不同操作系统。例如,Linux 通过 syscall.Syscall 使用系统调用,而 Windows 使用 Windows dll。在 Windows 上,dll 调用是 C 调用。它不使用 cgo。它确实通过 cgoruntime.cgocall 使用的相同动态 C 指针检查。没有 runtime.wincall 别名。

综上所述,不同的操作系统有不同的OS调用机制。

Command cgo

Passing pointers

Go is a garbage collected language, and the garbage collector needs to know the location of every pointer to Go memory. Because of this, there are restrictions on passing pointers between Go and C.

In this section the term Go pointer means a pointer to memory allocated by Go (such as by using the & operator or calling the predefined new function) and the term C pointer means a pointer to memory allocated by C (such as by a call to C.malloc). Whether a pointer is a Go pointer or a C pointer is a dynamic property determined by how the memory was allocated; it has nothing to do with the type of the pointer.

Go code may pass a Go pointer to C provided the Go memory to which it points does not contain any Go pointers. The C code must preserve this property: it must not store any Go pointers in Go memory, even temporarily. When passing a pointer to a field in a struct, the Go memory in question is the memory occupied by the field, not the entire struct. When passing a pointer to an element in an array or slice, the Go memory in question is the entire array or the entire backing array of the slice.

C code may not keep a copy of a Go pointer after the call returns.

A Go function called by C code may not return a Go pointer. A Go function called by C code may take C pointers as arguments, and it may store non-pointer or C pointer data through those pointers, but it may not store a Go pointer in memory pointed to by a C pointer. A Go function called by C code may take a Go pointer as an argument, but it must preserve the property that the Go memory to which it points does not contain any Go pointers.

Go code may not store a Go pointer in C memory. C code may store Go pointers in C memory, subject to the rule above: it must stop storing the Go pointer when the C function returns.

These rules are checked dynamically at runtime. The checking is controlled by the cgocheck setting of the GODEBUG environment variable. The default setting is GODEBUG=cgocheck=1, which implements reasonably cheap dynamic checks. These checks may be disabled entirely using GODEBUG=cgocheck=0. Complete checking of pointer handling, at some cost in run time, is available via GODEBUG=cgocheck=2.

It is possible to defeat this enforcement by using the unsafe package, and of course there is nothing stopping the C code from doing anything it likes. However, programs that break these rules are likely to fail in unexpected and unpredictable ways.

"These rules are checked dynamically at runtime."


基准:

换句话说,有谎言,该死的谎言和基准。

要在操作系统之间进行有效比较,您需要 运行 在相同的硬件上。例如,CPU、内存和 Rust 或硅盘之间的差异 I/O。我在同一台机器上双启动 Linux 和 Windows。

运行 背靠背基准测试至少 3 次。操作系统试图变得聪明。比如缓存I/O。使用虚拟机的语言需要预热时间。等等。

知道你在测量什么。如果你在做顺序I/O,你几乎所有的时间都花在了操作系统上。您是否关闭了恶意软件保护?等等。

以此类推

以下是使用双引导 Windows 和 Linux.

在同一台机器上 disk.go 的一些结果

Windows:

>go build disk.go
>/TimeMem disk
Duration : 18.3300322s
Elapsed time   : 18.38
Kernel time    : 13.71 (74.6%)
User time      : 4.62 (25.1%)

Linux:

$ go build disk.go
$ time ./disk
Duration : 18.54350723s
real    0m18.547s
user    0m2.336s
sys     0m16.236s

实际上,它们是相同的,持续时间为 18 秒 disk.go。只是操作系统之间关于什么被计为用户时间以及什么被计为内核或系统时间的一些差异。经过或实际时间相同。

在您的测试中,内核或系统时间为 93.72% runtime.cgocall 与 95.49% syscall.Syscall