函数切片参数与全局变量的性能?
Performance of function slice parameter vs global variable?
我有以下功能:
func checkFiles(path string, excludedPatterns []string) {
// ...
}
我想知道,既然 excludedPatterns
永远不会改变,我应该通过将 var 设为全局变量(而不是每次都将其传递给函数)来优化它,还是 Golang 已经通过将它们作为写时复制?
编辑:我想我可以将切片作为指针传递,但我仍然想知道写时复制行为(如果存在的话)以及一般来说我是否应该担心按值传递或通过指针。
从函数的名称来看,性能并不是那么重要,甚至考虑将参数移动到全局变量只是为了保存 time/space 需要将它们作为参数传递(检查文件等 IO 操作很多-比调用函数并向它们传递值慢得多。
Go 中的切片只是一些小的描述符,有点像一个结构体,带有一个指向后备数组的指针和 2 个 int
s,一个长度和一个容量。无论后备数组有多大,传递切片总是高效的,你甚至不应该考虑传递一个指向它们的指针,当然除非你想修改切片头。
Go 中的参数总是按值传递,并且会复制传递的值。如果传递指针,则指针值将被复制并传递。当一个切片被传递时,切片值(这是一个小的描述符)将被复制并传递——它将指向相同的后备数组(不会被复制)。
此外,如果您需要在函数中多次访问切片,一个参数通常是一个额外的收获,因为编译器可以进行进一步的优化/缓存,而如果它是一个全局变量,则必须更加小心。
有关切片及其内部结构的更多信息:Go Slices: usage and internals
如果您想要性能的确切数字,请进行基准测试!
这里有一些基准测试代码,它显示了两种解决方案之间没有区别(将切片作为参数传递或访问全局切片)。将其保存到 slices_test.go
和 运行 之类的文件中 go test -bench .
package main
import (
"testing"
)
var gslice = make([]string, 1000)
func global(s string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = gslice // Access global-slice
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func BenchmarkParameter(b *testing.B) {
for i := 0; i < b.N; i++ {
param("hi", gslice)
}
}
func BenchmarkGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
global("hi")
}
}
示例输出:
testing: warning: no tests to run
PASS
BenchmarkParameter-2 30000000 55.4 ns/op
BenchmarkGlobal-2 30000000 55.1 ns/op
ok _/V_/workspace/IczaGo/src/play 3.569s
借助@icza 的出色回答,还有另一种将切片作为参数传递的方法:指向切片的指针。
当您需要修改底层切片时,全局变量切片起作用,但将其作为参数传递不起作用,您实际上是在使用副本。为了减轻这种情况,实际上可以将切片作为指针传递。
有趣的是,它实际上比访问全局变量更快:
package main
import (
"testing"
)
var gslice = make([]string, 1000000)
func global(s string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = gslice // Access global-slice
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func paramPointer(s string, ss *[]string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func BenchmarkParameter(b *testing.B) {
for i := 0; i < b.N; i++ {
param("hi", gslice)
}
}
func BenchmarkParameterPointer(b *testing.B) {
for i := 0; i < b.N; i++ {
paramPointer("hi", &gslice)
}
}
func BenchmarkGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
global("hi")
}
}
结果:
goos: darwin
goarch: amd64
pkg: untitled
BenchmarkParameter-8 24437403 48.2 ns/op
BenchmarkParameterPointer-8 27483115 40.3 ns/op
BenchmarkGlobal-8 25631470 46.0 ns/op
我重写了基准测试,以便您可以比较结果。
如您所见,ParameterPointer 工作台在 10 条记录后开始领先。很有意思。
package slices_bench
import (
"testing"
)
var gslice = make([]string, 1000)
func global(s string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = gslice // Access global-slice
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func paramPointer(s string, ss *[]string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func BenchmarkPerformance(b *testing.B){
fixture := []struct {
desc string
records int
}{
{
desc: "1 record",
records: 1,
},
{
desc: "10 records",
records: 10,
},
{
desc: "100 records",
records: 100,
},
{
desc: "1000 records",
records: 1000,
},
{
desc: "10000 records",
records: 10000,
},
{
desc: "100000 records",
records: 100000,
},
}
tests := []struct {
desc string
fn func(b *testing.B, n int)
}{
{
desc: "ParameterPointer",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
paramPointer("hi", &gslice)
}
},
},
{
desc: "Parameter",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
param("hi", gslice)
}
},
},
{
desc: "Global",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
global("hi")
}
},
},
}
for _, t := range tests {
b.Run(t.desc, func(b *testing.B) {
for _, f := range fixture {
b.Run(f.desc, func(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
t.fn(b, f.records)
}
})
}
})
}
}
结果:
goos: windows
goarch: amd64
pkg: benchs/slices-bench
cpu: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
BenchmarkPerformance
BenchmarkPerformance/ParameterPointer
BenchmarkPerformance/ParameterPointer/1_record
BenchmarkPerformance/ParameterPointer/1_record-16 38661910 31.18 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/10_records
BenchmarkPerformance/ParameterPointer/10_records-16 4160023 288.4 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/100_records
BenchmarkPerformance/ParameterPointer/100_records-16 445131 2748 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/1000_records
BenchmarkPerformance/ParameterPointer/1000_records-16 43876 27380 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/10000_records
BenchmarkPerformance/ParameterPointer/10000_records-16 4441 273922 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/100000_records
BenchmarkPerformance/ParameterPointer/100000_records-16 439 2739282 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter
BenchmarkPerformance/Parameter/1_record
BenchmarkPerformance/Parameter/1_record-16 39860619 30.79 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/10_records
BenchmarkPerformance/Parameter/10_records-16 4152728 288.6 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/100_records
BenchmarkPerformance/Parameter/100_records-16 445634 2757 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/1000_records
BenchmarkPerformance/Parameter/1000_records-16 43618 27496 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/10000_records
BenchmarkPerformance/Parameter/10000_records-16 4450 273960 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/100000_records
BenchmarkPerformance/Parameter/100000_records-16 435 2739053 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global
BenchmarkPerformance/Global/1_record
BenchmarkPerformance/Global/1_record-16 38813095 30.97 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/10_records
BenchmarkPerformance/Global/10_records-16 4148433 288.4 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/100_records
BenchmarkPerformance/Global/100_records-16 429274 2758 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/1000_records
BenchmarkPerformance/Global/1000_records-16 43591 27412 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/10000_records
BenchmarkPerformance/Global/10000_records-16 4521 274420 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/100000_records
BenchmarkPerformance/Global/100000_records-16 436 2751042 ns/op 0 B/op 0 allocs/op
我有以下功能:
func checkFiles(path string, excludedPatterns []string) {
// ...
}
我想知道,既然 excludedPatterns
永远不会改变,我应该通过将 var 设为全局变量(而不是每次都将其传递给函数)来优化它,还是 Golang 已经通过将它们作为写时复制?
编辑:我想我可以将切片作为指针传递,但我仍然想知道写时复制行为(如果存在的话)以及一般来说我是否应该担心按值传递或通过指针。
从函数的名称来看,性能并不是那么重要,甚至考虑将参数移动到全局变量只是为了保存 time/space 需要将它们作为参数传递(检查文件等 IO 操作很多-比调用函数并向它们传递值慢得多。
Go 中的切片只是一些小的描述符,有点像一个结构体,带有一个指向后备数组的指针和 2 个 int
s,一个长度和一个容量。无论后备数组有多大,传递切片总是高效的,你甚至不应该考虑传递一个指向它们的指针,当然除非你想修改切片头。
Go 中的参数总是按值传递,并且会复制传递的值。如果传递指针,则指针值将被复制并传递。当一个切片被传递时,切片值(这是一个小的描述符)将被复制并传递——它将指向相同的后备数组(不会被复制)。
此外,如果您需要在函数中多次访问切片,一个参数通常是一个额外的收获,因为编译器可以进行进一步的优化/缓存,而如果它是一个全局变量,则必须更加小心。
有关切片及其内部结构的更多信息:Go Slices: usage and internals
如果您想要性能的确切数字,请进行基准测试!
这里有一些基准测试代码,它显示了两种解决方案之间没有区别(将切片作为参数传递或访问全局切片)。将其保存到 slices_test.go
和 运行 之类的文件中 go test -bench .
package main
import (
"testing"
)
var gslice = make([]string, 1000)
func global(s string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = gslice // Access global-slice
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func BenchmarkParameter(b *testing.B) {
for i := 0; i < b.N; i++ {
param("hi", gslice)
}
}
func BenchmarkGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
global("hi")
}
}
示例输出:
testing: warning: no tests to run
PASS
BenchmarkParameter-2 30000000 55.4 ns/op
BenchmarkGlobal-2 30000000 55.1 ns/op
ok _/V_/workspace/IczaGo/src/play 3.569s
借助@icza 的出色回答,还有另一种将切片作为参数传递的方法:指向切片的指针。
当您需要修改底层切片时,全局变量切片起作用,但将其作为参数传递不起作用,您实际上是在使用副本。为了减轻这种情况,实际上可以将切片作为指针传递。
有趣的是,它实际上比访问全局变量更快:
package main
import (
"testing"
)
var gslice = make([]string, 1000000)
func global(s string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = gslice // Access global-slice
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func paramPointer(s string, ss *[]string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func BenchmarkParameter(b *testing.B) {
for i := 0; i < b.N; i++ {
param("hi", gslice)
}
}
func BenchmarkParameterPointer(b *testing.B) {
for i := 0; i < b.N; i++ {
paramPointer("hi", &gslice)
}
}
func BenchmarkGlobal(b *testing.B) {
for i := 0; i < b.N; i++ {
global("hi")
}
}
结果:
goos: darwin
goarch: amd64
pkg: untitled
BenchmarkParameter-8 24437403 48.2 ns/op
BenchmarkParameterPointer-8 27483115 40.3 ns/op
BenchmarkGlobal-8 25631470 46.0 ns/op
我重写了基准测试,以便您可以比较结果。
如您所见,ParameterPointer 工作台在 10 条记录后开始领先。很有意思。
package slices_bench
import (
"testing"
)
var gslice = make([]string, 1000)
func global(s string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = gslice // Access global-slice
}
}
func param(s string, ss []string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func paramPointer(s string, ss *[]string) {
for i := 0; i < 100; i++ { // Cycle to access slice may times
_ = s
_ = ss // Access parameter-slice
}
}
func BenchmarkPerformance(b *testing.B){
fixture := []struct {
desc string
records int
}{
{
desc: "1 record",
records: 1,
},
{
desc: "10 records",
records: 10,
},
{
desc: "100 records",
records: 100,
},
{
desc: "1000 records",
records: 1000,
},
{
desc: "10000 records",
records: 10000,
},
{
desc: "100000 records",
records: 100000,
},
}
tests := []struct {
desc string
fn func(b *testing.B, n int)
}{
{
desc: "ParameterPointer",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
paramPointer("hi", &gslice)
}
},
},
{
desc: "Parameter",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
param("hi", gslice)
}
},
},
{
desc: "Global",
fn: func(b *testing.B, n int) {
for j := 0; j < n; j++ {
global("hi")
}
},
},
}
for _, t := range tests {
b.Run(t.desc, func(b *testing.B) {
for _, f := range fixture {
b.Run(f.desc, func(b *testing.B) {
b.ReportAllocs()
b.ResetTimer()
for i := 0; i < b.N; i++ {
t.fn(b, f.records)
}
})
}
})
}
}
结果:
goos: windows
goarch: amd64
pkg: benchs/slices-bench
cpu: Intel(R) Core(TM) i7-10700 CPU @ 2.90GHz
BenchmarkPerformance
BenchmarkPerformance/ParameterPointer
BenchmarkPerformance/ParameterPointer/1_record
BenchmarkPerformance/ParameterPointer/1_record-16 38661910 31.18 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/10_records
BenchmarkPerformance/ParameterPointer/10_records-16 4160023 288.4 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/100_records
BenchmarkPerformance/ParameterPointer/100_records-16 445131 2748 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/1000_records
BenchmarkPerformance/ParameterPointer/1000_records-16 43876 27380 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/10000_records
BenchmarkPerformance/ParameterPointer/10000_records-16 4441 273922 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/ParameterPointer/100000_records
BenchmarkPerformance/ParameterPointer/100000_records-16 439 2739282 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter
BenchmarkPerformance/Parameter/1_record
BenchmarkPerformance/Parameter/1_record-16 39860619 30.79 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/10_records
BenchmarkPerformance/Parameter/10_records-16 4152728 288.6 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/100_records
BenchmarkPerformance/Parameter/100_records-16 445634 2757 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/1000_records
BenchmarkPerformance/Parameter/1000_records-16 43618 27496 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/10000_records
BenchmarkPerformance/Parameter/10000_records-16 4450 273960 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Parameter/100000_records
BenchmarkPerformance/Parameter/100000_records-16 435 2739053 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global
BenchmarkPerformance/Global/1_record
BenchmarkPerformance/Global/1_record-16 38813095 30.97 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/10_records
BenchmarkPerformance/Global/10_records-16 4148433 288.4 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/100_records
BenchmarkPerformance/Global/100_records-16 429274 2758 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/1000_records
BenchmarkPerformance/Global/1000_records-16 43591 27412 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/10000_records
BenchmarkPerformance/Global/10000_records-16 4521 274420 ns/op 0 B/op 0 allocs/op
BenchmarkPerformance/Global/100000_records
BenchmarkPerformance/Global/100000_records-16 436 2751042 ns/op 0 B/op 0 allocs/op