为什么这段 Go 代码的速度与 Python 的速度相当（而且快不了多少）？

Question

我需要为超过 1GB 的文件计算 sha256 校验和（按块读取文件），目前我正在使用 python：

import hashlib
import time

start_time = time.time()


def sha256sum(filename="big.txt", block_size=2 ** 13):
    sha = hashlib.sha256()
    with open(filename, 'rb') as f:
        for chunk in iter(lambda: f.read(block_size), b''):
           sha.update(chunk)
    return sha.hexdigest()

input_file = '/tmp/1GB.raw'
print 'checksum is: %s\n' % sha256sum(input_file)
print 'Elapsed time: %s' % str(time.time() - start_time)

我想尝试 golang 认为我可以获得更快的结果，但在尝试以下代码后，它运行速度慢了几秒钟：

package main

import (
    "crypto/sha256"
    "fmt"
    "io"
    "math"
    "os"
    "time"
)   

const fileChunk = 8192

func File(file string) string {
    fh, err := os.Open(file)

    if err != nil {
        panic(err.Error())
    }   

    defer fh.Close()

    stat, _ := fh.Stat()
    size := stat.Size()
    chunks := uint64(math.Ceil(float64(size) / float64(fileChunk)))
    h := sha256.New()

    for i := uint64(0); i < chunks; i++ {
        csize := int(math.Min(fileChunk, float64(size-int64(i*fileChunk))))
        buf := make([]byte, csize)
        fh.Read(buf)
        io.WriteString(h, string(buf))
    }   

    return fmt.Sprintf("%x", h.Sum(nil))
}   

func main() {
    start := time.Now()
    fmt.Printf("checksum is: %s\n", File("/tmp/1G.raw"))
    elapsed := time.Since(start)
    fmt.Printf("Elapsed time: %s\n", elapsed)
}

知道如何改进 golang 代码吗？也许使用所有计算机 CPU 核心，一个用于阅读，另一个用于散列，有什么想法吗？

更新

按照建议，我正在使用此代码：

package main

import (
    "crypto/sha256"
    "encoding/hex"
    "fmt"
    "io"
    "os"
    "time"
)

func main() {
    start := time.Now()
    fh, err := os.Open("/tmp/1GB.raw")
    if err != nil {
        panic(err.Error())
    }
    defer fh.Close()

    h := sha256.New()
    _, err = io.Copy(h, fh)
    if err != nil {
        panic(err.Error())
    }
    fmt.Println(hex.EncodeToString(h.Sum(nil)))

    fmt.Printf("Elapsed time: %s\n", time.Since(start))
}

为了测试，我用这个创建了 1GB 的文件：

# mkfile 1G /tmp/1GB.raw

新版本速度更快但没那么快，使用渠道怎么样？使用多个 CPU/core 可以帮助改进吗？我期望至少有 20% 的改进，但不幸的是我几乎没有任何收获，几乎没有。

python

的时间结果

 5.867u 0.250s 0:06.15 99.3%    0+0k 0+0io 0pf+0w

编译（go build）和执行二进制文件后 go 的时间结果：

 5.687u 0.198s 0:05.93 98.9%    0+0k 0+0io 0pf+0w

还有什么想法吗？

测试结果

使用在@icza

接受的答案中发布的渠道版本

Elapsed time: 5.894779733s

使用无频道的版本：

Elapsed time: 5.823489239s

我以为使用频道会增加一点，但似乎不会。

我在 MacBook Pro OS X Yosemite 上运行。使用 go 版本：

go version go1.4.1 darwin/amd64

更新 2

将 runtime.GOMAXPROCS 设置为 4:

runtime.GOMAXPROCS(4)

让事情变得更快：

Elapsed time: 5.741511748s

更新 3

将块大小更改为 8192（与 python 版本中的一样）给出预期结果：

...
for b, hasMore := make([]byte, 8192<<10), true; hasMore; {
...

也仅使用 runtime.GOMAXPROCS(2)

Answer 1

您的解决方案效率很低，因为您在每次迭代中都创建了新的缓冲区，您只使用了一次就把它们扔掉了。

您还将缓冲区的内容 (buf) 转换为 string，然后将 string 写入 sha256 计算器，后者将其转换回字节：绝对不必要的回合-旅行。

这是另一个非常快速的解决方案，测试一下它的性能：

fh, err := os.Open(file)
if err != nil {
    panic(err.Error())
}   
defer fh.Close()

h := sha256.New()
_, err = io.Copy(h, fh)
if err != nil {
    panic(err.Error())
}   

fmt.Println(hex.EncodeToString(h.Sum(nil)))

一点解释：

io.Copy() is a function which will read all the data (until EOF is reached) from a Reader and write all those to the specified Writer. Since the sha256 calculator (hash.Hash) implements Writer and the File（或者更确切地说 *File）实现了 Reader，这非常简单。

一旦所有数据都写入哈希，hex.EncodeToString() 将简单地将结果（由 hash.Sum(nil) 获得）转换为人类可读的十六进制字符串。

最终判决

该程序从硬盘读取 1GB 数据并对其进行一些计算（计算其 SHA-256 哈希值）。由于从硬盘读取是一个相对较慢的操作，Go 版本的性能提升与 Python 方案相比并不显着。总体运行需要几秒钟，这与从硬盘读取 1 GB 数据所需的时间处于同一数量级。由于 Go 和 Python 解决方案从磁盘读取数据所需的时间大致相同，因此您不会看到太多不同的结果。

使用多个 Goroutine 提高性能的可能性

您可以通过将文件的一个块读入一个缓冲区，开始计算其 SHA-256 哈希，同时读取文件的下一个块来提高性能。完成后，将其发送到 SHA-256 计算器，同时将下一个块读入第一个缓冲区。

但是由于从磁盘读取数据比计算其 SHA-256 摘要（或更新摘要计算器的状态）花费更多的时间，因此您不会看到明显的改进。您的性能瓶颈始终是将数据读入内存所需的时间。

这是一个完整的、运行可用的解决方案，它使用 2 个 goroutines，其中一个 goroutine 读取文件的一个块，另一个 goroutine 计算先前读取的块的哈希值，当 goroutine 的读取完成时继续散列并允许另一个并行读取。

阶段（读取、散列）之间的正确同步是通过通道完成的。正如所怀疑的那样，性能增益在时间 4% 上略多一点（可能因 CPU 和硬盘速度而异）因为哈希计算与磁盘相比可以忽略不计阅读时间。如果硬盘的读取速度更快，性能增益可能会更高（在SSD上测试）。

所以完整的程序：

package main

import (
    "crypto/sha256"
    "encoding/hex"
    "fmt"
    "hash"
    "io"
    "os"
    "runtime"
    "time"
)

const file = "t:/1GB.raw"

func main() {
    runtime.GOMAXPROCS(2) // Important as Go 1.4 uses only 1 by default!

    start := time.Now()

    f, err := os.Open(file)
    if err != nil {
        panic(err)
    }
    defer f.Close()

    h := sha256.New()

    // 2 channels: used to give green light for reading into buffer b1 or b2
    readch1, readch2 := make(chan int, 1), make(chan int, 1)

    // 2 channels: used to give green light for hashing the content of b1 or b2
    hashch1, hashch2 := make(chan int, 1), make(chan int, 1)

    // Start signal: Allow b1 to be read and hashed
    readch1 <- 1
    hashch1 <- 1

    go hashHelper(f, h, readch1, readch2, hashch1, hashch2)

    hashHelper(f, h, readch2, readch1, hashch2, hashch1)

    fmt.Println(hex.EncodeToString(h.Sum(nil)))

    fmt.Printf("Elapsed time: %s\n", time.Since(start))
}

func hashHelper(f *os.File, h hash.Hash, mayRead <-chan int, readDone chan<- int, mayHash <-chan int, hashDone chan<- int) {
    for b, hasMore := make([]byte, 64<<10), true; hasMore; {
        <-mayRead
        n, err := f.Read(b)
        if err != nil {
            if err == io.EOF {
                hasMore = false
            } else {
                panic(err)
            }
        }
        readDone <- 1

        <-mayHash
        _, err = h.Write(b[:n])
        if err != nil {
            panic(err)
        }
        hashDone <- 1
    }
}

备注：

在我的解决方案中，我只使用了 2 个 goroutine。使用更多是没有意义的，因为如前所述，磁盘读取速度是瓶颈，它已经被最大程度地使用，因为 2 个 goroutine 将能够随时执行读取。

关于同步的注意事项： 2 个 goroutines 运行并行。每个 goroutine 都可以随时使用其本地缓冲区 b。共享 File 和共享 Hash 的访问由通道同步，在任何给定时间只允许 1 个 goroutine 使用 Hash，并且只允许 1 个 goroutine 使用在任何给定时间从 File（阅读）。

Answer 2

对于不知道的人，我认为这会有所帮助。

https://blog.golang.org/pipelines

本页末尾有goroutines对md5文件的解决方法

我在自己的“~”目录中尝试了这个。使用 goroutines 花费 1.7 秒，没有 goroutines 使用 2.8 秒。

这里是没有 goroutines 时的时间使用情况。而且我不知道在使用 goroutines 时如何计算时间使用，因为所有这些东西都是同时运行的。 time use 2.805522165s time read file 759.476091ms time md5 1.710393575s time sort 17.355134ms

为什么这段 Go 代码的速度与 Python 的速度相当（而且快不了多少）？

Why is this Go code the equivalent speed as that of Python (and not much faster)?

performance

checksum

go

更新

测试结果

更新 2

更新 3

最终判决

使用多个 Goroutine 提高性能的可能性