在 Julia 中索引数组时避免内存分配
Avoid memory allocation when indexing an array in Julia
更新: 请注意,Julia v1+ 中的相关函数是 view
问题:我想在不触发内存分配的情况下对数组进行索引,尤其是在将索引元素传递给函数时。通过阅读 Julia 文档,我怀疑答案围绕着使用 sub
函数,但不太明白如何......
工作示例: 我构建了一个 Float64
(x
) 的大向量,然后是 x
中每个观察值的索引。
N = 10000000
x = randn(N)
inds = [1:N]
现在,我将 mean
函数计时在 x
和 x[inds]
之上(我首先 运行 mean(randn(2))
以避免任何编译器计时不规则):
@time mean(x)
@time mean(x[inds])
这是一个相同的计算,但正如预期的那样,计时结果是:
elapsed time: 0.007029772 seconds (96 bytes allocated)
elapsed time: 0.067880112 seconds (80000208 bytes allocated, 35.38% gc time)
那么,对于 inds
的任意选择(以及数组和函数的任意选择),是否有解决内存分配问题的方法?
EDIT: Read tholy's answer too to get a full picture!
使用索引数组时,目前 Julia 0.4-pre(2015 年 2 月开始)的情况不太好:
julia> N = 10000000;
julia> x = randn(N);
julia> inds = [1:N];
julia> @time mean(x)
elapsed time: 0.010702729 seconds (96 bytes allocated)
elapsed time: 0.012167155 seconds (96 bytes allocated)
julia> @time mean(x[inds])
elapsed time: 0.088312275 seconds (76 MB allocated, 17.87% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.073672734 seconds (76 MB allocated, 3.27% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.071646757 seconds (76 MB allocated, 1.08% gc time in 1 pauses with 0 full sweep)
julia> xs = sub(x,inds); # Only works on 0.4
julia> @time mean(xs)
elapsed time: 0.057446177 seconds (96 bytes allocated)
elapsed time: 0.096983673 seconds (96 bytes allocated)
elapsed time: 0.096711312 seconds (96 bytes allocated)
julia> using ArrayViews
julia> xv = view(x, 1:N) # Note use of a range, not [1:N]!
julia> @time mean(xv)
elapsed time: 0.012919509 seconds (96 bytes allocated)
elapsed time: 0.013010655 seconds (96 bytes allocated)
elapsed time: 0.01288134 seconds (96 bytes allocated)
julia> xs = sub(x,1:N) # Works on 0.3 and 0.4
julia> @time mean(xs)
elapsed time: 0.014191482 seconds (96 bytes allocated)
elapsed time: 0.014023089 seconds (96 bytes allocated)
elapsed time: 0.01257188 seconds (96 bytes allocated)
- 所以虽然我们可以避免内存分配,但实际上我们仍然更慢(!)。
- 问题是按数组而不是范围进行索引。你不能在 0.3 上为此使用
sub
,但你可以在 0.4 上使用。
- 如果我们可以按范围索引,那么我们可以在 0.3 上使用 ArrayViews.jl 并在 0.4 上内置。这个案例几乎和原来的一样好
mean
.
我注意到使用的索引数量较少(而不是整个范围),差距小得多,内存分配也低,因此 sub
可能值得:
N = 100000000
x = randn(N)
inds = [1:div(N,10)]
@time mean(x)
@time mean(x)
@time mean(x)
@time mean(x[inds])
@time mean(x[inds])
@time mean(x[inds])
xi = sub(x,inds)
@time mean(xi)
@time mean(xi)
@time mean(xi)
给予
elapsed time: 0.092831612 seconds (985 kB allocated)
elapsed time: 0.067694917 seconds (96 bytes allocated)
elapsed time: 0.066209038 seconds (96 bytes allocated)
elapsed time: 0.066816927 seconds (76 MB allocated, 20.62% gc time in 1 pauses with 1 full sweep)
elapsed time: 0.057211528 seconds (76 MB allocated, 19.57% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.046782848 seconds (76 MB allocated, 1.81% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.186084807 seconds (4 MB allocated)
elapsed time: 0.057476269 seconds (96 bytes allocated)
elapsed time: 0.05733602 seconds (96 bytes allocated)
只需使用xs = sub(x, 1:N)
。请注意,这与 x = sub(x, [1:N])
不同;在 julia 0.3 上,后者将失败,而在 julia 0.4-pre 上,后者将比前者慢得多。在 julia 0.4-pre 上,sub(x, 1:N)
与 view
:
一样快
julia> N = 10000000;
julia> x = randn(N);
julia> xs = sub(x, 1:N);
julia> using ArrayViews
julia> xv = view(x, 1:N);
julia> mean(x)
-0.0002491126429772525
julia> mean(xs)
-0.0002491126429772525
julia> mean(xv)
-0.0002491126429772525
julia> @time mean(x);
elapsed time: 0.015345806 seconds (27 kB allocated)
julia> @time mean(xs);
elapsed time: 0.013815785 seconds (96 bytes allocated)
julia> @time mean(xv);
elapsed time: 0.015871052 seconds (96 bytes allocated)
sub(x, inds)
比 sub(x, 1:N)
慢有几个原因:
- 每次访问
xs[i]
对应x[inds[i]]
;我们必须查找两个内存位置而不是一个
- 如果
inds
顺序不对,您在访问 x
的元素时会出现糟糕的缓存行为
- 它破坏了使用 SIMD 向量化的能力
在这种情况下,后者可能是最重要的影响。这不是 Julia 的限制;如果您用 C、Fortran 或汇编编写等效代码,也会发生同样的事情。
请注意,说 sum(sub(x, inds))
仍然比 sum(x[inds])
快(直到后者变成前者,julia 0.4 正式发布时它应该这样)。但是,如果您必须对 xs = sub(x, inds)
执行许多操作,在某些情况下,制作副本是值得的,即使它分配了内存,这样您就可以在值存储在中时利用可能的优化连续内存。
更新: 请注意,Julia v1+ 中的相关函数是 view
问题:我想在不触发内存分配的情况下对数组进行索引,尤其是在将索引元素传递给函数时。通过阅读 Julia 文档,我怀疑答案围绕着使用 sub
函数,但不太明白如何......
工作示例: 我构建了一个 Float64
(x
) 的大向量,然后是 x
中每个观察值的索引。
N = 10000000
x = randn(N)
inds = [1:N]
现在,我将 mean
函数计时在 x
和 x[inds]
之上(我首先 运行 mean(randn(2))
以避免任何编译器计时不规则):
@time mean(x)
@time mean(x[inds])
这是一个相同的计算,但正如预期的那样,计时结果是:
elapsed time: 0.007029772 seconds (96 bytes allocated)
elapsed time: 0.067880112 seconds (80000208 bytes allocated, 35.38% gc time)
那么,对于 inds
的任意选择(以及数组和函数的任意选择),是否有解决内存分配问题的方法?
EDIT: Read tholy's answer too to get a full picture!
使用索引数组时,目前 Julia 0.4-pre(2015 年 2 月开始)的情况不太好:
julia> N = 10000000;
julia> x = randn(N);
julia> inds = [1:N];
julia> @time mean(x)
elapsed time: 0.010702729 seconds (96 bytes allocated)
elapsed time: 0.012167155 seconds (96 bytes allocated)
julia> @time mean(x[inds])
elapsed time: 0.088312275 seconds (76 MB allocated, 17.87% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.073672734 seconds (76 MB allocated, 3.27% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.071646757 seconds (76 MB allocated, 1.08% gc time in 1 pauses with 0 full sweep)
julia> xs = sub(x,inds); # Only works on 0.4
julia> @time mean(xs)
elapsed time: 0.057446177 seconds (96 bytes allocated)
elapsed time: 0.096983673 seconds (96 bytes allocated)
elapsed time: 0.096711312 seconds (96 bytes allocated)
julia> using ArrayViews
julia> xv = view(x, 1:N) # Note use of a range, not [1:N]!
julia> @time mean(xv)
elapsed time: 0.012919509 seconds (96 bytes allocated)
elapsed time: 0.013010655 seconds (96 bytes allocated)
elapsed time: 0.01288134 seconds (96 bytes allocated)
julia> xs = sub(x,1:N) # Works on 0.3 and 0.4
julia> @time mean(xs)
elapsed time: 0.014191482 seconds (96 bytes allocated)
elapsed time: 0.014023089 seconds (96 bytes allocated)
elapsed time: 0.01257188 seconds (96 bytes allocated)
- 所以虽然我们可以避免内存分配,但实际上我们仍然更慢(!)。
- 问题是按数组而不是范围进行索引。你不能在 0.3 上为此使用
sub
,但你可以在 0.4 上使用。 - 如果我们可以按范围索引,那么我们可以在 0.3 上使用 ArrayViews.jl 并在 0.4 上内置。这个案例几乎和原来的一样好
mean
.
我注意到使用的索引数量较少(而不是整个范围),差距小得多,内存分配也低,因此 sub
可能值得:
N = 100000000
x = randn(N)
inds = [1:div(N,10)]
@time mean(x)
@time mean(x)
@time mean(x)
@time mean(x[inds])
@time mean(x[inds])
@time mean(x[inds])
xi = sub(x,inds)
@time mean(xi)
@time mean(xi)
@time mean(xi)
给予
elapsed time: 0.092831612 seconds (985 kB allocated)
elapsed time: 0.067694917 seconds (96 bytes allocated)
elapsed time: 0.066209038 seconds (96 bytes allocated)
elapsed time: 0.066816927 seconds (76 MB allocated, 20.62% gc time in 1 pauses with 1 full sweep)
elapsed time: 0.057211528 seconds (76 MB allocated, 19.57% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.046782848 seconds (76 MB allocated, 1.81% gc time in 1 pauses with 0 full sweep)
elapsed time: 0.186084807 seconds (4 MB allocated)
elapsed time: 0.057476269 seconds (96 bytes allocated)
elapsed time: 0.05733602 seconds (96 bytes allocated)
只需使用xs = sub(x, 1:N)
。请注意,这与 x = sub(x, [1:N])
不同;在 julia 0.3 上,后者将失败,而在 julia 0.4-pre 上,后者将比前者慢得多。在 julia 0.4-pre 上,sub(x, 1:N)
与 view
:
julia> N = 10000000;
julia> x = randn(N);
julia> xs = sub(x, 1:N);
julia> using ArrayViews
julia> xv = view(x, 1:N);
julia> mean(x)
-0.0002491126429772525
julia> mean(xs)
-0.0002491126429772525
julia> mean(xv)
-0.0002491126429772525
julia> @time mean(x);
elapsed time: 0.015345806 seconds (27 kB allocated)
julia> @time mean(xs);
elapsed time: 0.013815785 seconds (96 bytes allocated)
julia> @time mean(xv);
elapsed time: 0.015871052 seconds (96 bytes allocated)
sub(x, inds)
比 sub(x, 1:N)
慢有几个原因:
- 每次访问
xs[i]
对应x[inds[i]]
;我们必须查找两个内存位置而不是一个 - 如果
inds
顺序不对,您在访问x
的元素时会出现糟糕的缓存行为
- 它破坏了使用 SIMD 向量化的能力
在这种情况下,后者可能是最重要的影响。这不是 Julia 的限制;如果您用 C、Fortran 或汇编编写等效代码,也会发生同样的事情。
请注意,说 sum(sub(x, inds))
仍然比 sum(x[inds])
快(直到后者变成前者,julia 0.4 正式发布时它应该这样)。但是,如果您必须对 xs = sub(x, inds)
执行许多操作,在某些情况下,制作副本是值得的,即使它分配了内存,这样您就可以在值存储在中时利用可能的优化连续内存。