如何在 Julia 中获取索引列表的补集?
How can I obtain the complement of list of indexes in Julia?
假设我有一个变量i = [1,3,5]
,它是在我对数组应用过滤器时获得的。现在,假设这个数组有 10 个元素,我想获得“补充”索引。我的意思是,我想获得 ic = [2,4,6,7,8,9,10]
.
是否有一种简洁快捷的方法来获取索引的补充列表?
我的意思是,我可以通过常规循环来做到这一点,但是有没有一种方法可以通过列表理解来做到这一点?
您需要获取 1:10
中不在 i
中的所有元素。所以使用列表理解:
julia> i = [1,3,5];
julia> ic = [x for x ∈ 1:10 if x ∉ i]
7-element Array{Int64,1}:
2
4
6
7
8
9
10
您可以在 REPL 中使用 \in
Tab 输入 ∈。 ∉可以用\notin
Tab来实现。如果您在不允许这样做的环境中进行编码,您可以从此处复制或键入以下内容之一:
julia> ic = [x for x in 1:10 if !(x in i)]
julia> ic = [x for x in 1:10 if !in(x, i)] # operators are functions
性能
如果您关心性能,这里是基准:
julia> @benchmark ic = [x for x ∈ 1:10 if x ∉ i]
BenchmarkTools.Trial:
memory estimate: 288 bytes
allocs estimate: 4
--------------
minimum time: 462.274 ns (0.00% GC)
median time: 471.168 ns (0.00% GC)
mean time: 497.947 ns (2.86% GC)
maximum time: 13.115 μs (95.25% GC)
--------------
samples: 10000
evals/sample: 197
您可以使用setdiff
函数
julia> x = [1, 3, 5];
julia> y = collect(1:10);
julia> setdiff(y, x)
7-element Vector{Int64}:
2
4
6
7
8
9
10
性能方面,基于循环的实现更好,因为我们可以考虑到新索引是原始索引的子集
function mysetdiff(y, x)
res = Vector{eltype(y)}(undef, length(y) - length(x))
i = 1
@inbounds for el in y
el ∈ x && continue
res[i] = el
i += 1
end
res
end
与比较
using BenchmarkTools
@btime [z for z ∈ $y if z ∉ $x]
# 141.893 ns (4 allocations: 288 bytes)
@btime setdiff($y, $x)
# 477.056 ns (8 allocations: 688 bytes)
@btime mysetdiff($y, $x)
# 46.434 ns (1 allocation: 144 bytes)
如果你很在意性能你也可以考虑:
julia> @benchmark deleteat!([1:10;], $i) # indices must be unique and sorted
BenchmarkTools.Trial:
memory estimate: 160 bytes
allocs estimate: 1
--------------
minimum time: 53.798 ns (0.00% GC)
median time: 60.790 ns (0.00% GC)
mean time: 71.125 ns (1.76% GC)
maximum time: 618.946 ns (77.28% GC)
--------------
samples: 10000
evals/sample: 987
和
julia> @benchmark (x = trues(10); x[$i] .= false; findall(x)) # if Bool-array is enough for you you can skip the last step to save 50% of time
BenchmarkTools.Trial:
memory estimate: 272 bytes
allocs estimate: 3
--------------
minimum time: 124.863 ns (0.00% GC)
median time: 134.033 ns (0.00% GC)
mean time: 157.668 ns (6.84% GC)
maximum time: 3.000 μs (95.44% GC)
--------------
samples: 10000
evals/sample: 905
与之前提出的相比:
julia> @benchmark ic = [x for x ∈ 1:10 if x ∉ $i]
BenchmarkTools.Trial:
memory estimate: 288 bytes
allocs estimate: 4
--------------
minimum time: 170.714 ns (0.00% GC)
median time: 196.571 ns (0.00% GC)
mean time: 222.510 ns (4.43% GC)
maximum time: 3.078 μs (91.01% GC)
--------------
samples: 10000
evals/sample: 700
和
julia> @benchmark setdiff($[1:10;], $i)
BenchmarkTools.Trial:
memory estimate: 672 bytes
allocs estimate: 7
--------------
minimum time: 504.145 ns (0.00% GC)
median time: 514.508 ns (0.00% GC)
mean time: 589.584 ns (2.75% GC)
maximum time: 8.954 μs (90.69% GC)
--------------
samples: 10000
evals/sample: 193
和(来自另一个 post 的自定义实现)
julia> @benchmark mysetdiff($[1:10;], $i)
BenchmarkTools.Trial:
memory estimate: 144 bytes
allocs estimate: 1
--------------
minimum time: 44.748 ns (0.00% GC)
median time: 46.869 ns (0.00% GC)
mean time: 52.780 ns (1.75% GC)
maximum time: 431.919 ns (88.49% GC)
--------------
samples: 10000
evals/sample: 990
既然你已经使用了过滤,那么反过来也一样。或者,如果您拥有用于获取 i
的函数,则可以将其取反以获得索引的补集。
julia> i = [1,3,5];
julia> filter(x -> x ∉ i, 1:10)
7-element Array{Int64,1}:
2
4
6
7
8
9
10
基准
@benchmark filter(x -> x ∉ i, 1:10)
BenchmarkTools.Trial:
memory estimate: 304 bytes
allocs estimate: 2
--------------
minimum time: 501.036 ns (0.00% GC)
median time: 511.399 ns (0.00% GC)
mean time: 546.609 ns (1.26% GC)
maximum time: 7.516 μs (92.31% GC)
--------------
samples: 10000
evals/sample: 193
参考方案
@benchmark ic = [x for x ∈ 1:10 if x ∉ i]
BenchmarkTools.Trial:
memory estimate: 288 bytes
allocs estimate: 4
--------------
minimum time: 626.036 ns (0.00% GC)
median time: 646.154 ns (0.00% GC)
mean time: 692.179 ns (2.72% GC)
maximum time: 26.065 μs (95.43% GC)
--------------
samples: 10000
evals/sample: 169
要从 InvertedIndices
包(通常它也与 DataFrames
一起加载)中添加到列表中的另一个值得了解的语法(虽然不是太快):
(1:10)[Not(i)]
julia> @benchmark (1:10)[Not($i)]
BenchmarkTools.Trial:
memory estimate: 1.42 KiB
allocs estimate: 39
--------------
minimum time: 1.130 μs (0.00% GC)
median time: 1.320 μs (0.00% GC)
mean time: 1.713 μs (5.05% GC)
maximum time: 324.000 μs (98.81% GC)
--------------
samples: 10000
evals/sample: 10
假设我有一个变量i = [1,3,5]
,它是在我对数组应用过滤器时获得的。现在,假设这个数组有 10 个元素,我想获得“补充”索引。我的意思是,我想获得 ic = [2,4,6,7,8,9,10]
.
是否有一种简洁快捷的方法来获取索引的补充列表?
我的意思是,我可以通过常规循环来做到这一点,但是有没有一种方法可以通过列表理解来做到这一点?
您需要获取 1:10
中不在 i
中的所有元素。所以使用列表理解:
julia> i = [1,3,5];
julia> ic = [x for x ∈ 1:10 if x ∉ i]
7-element Array{Int64,1}:
2
4
6
7
8
9
10
您可以在 REPL 中使用 \in
Tab 输入 ∈。 ∉可以用\notin
Tab来实现。如果您在不允许这样做的环境中进行编码,您可以从此处复制或键入以下内容之一:
julia> ic = [x for x in 1:10 if !(x in i)]
julia> ic = [x for x in 1:10 if !in(x, i)] # operators are functions
性能
如果您关心性能,这里是基准:
julia> @benchmark ic = [x for x ∈ 1:10 if x ∉ i]
BenchmarkTools.Trial:
memory estimate: 288 bytes
allocs estimate: 4
--------------
minimum time: 462.274 ns (0.00% GC)
median time: 471.168 ns (0.00% GC)
mean time: 497.947 ns (2.86% GC)
maximum time: 13.115 μs (95.25% GC)
--------------
samples: 10000
evals/sample: 197
您可以使用setdiff
函数
julia> x = [1, 3, 5];
julia> y = collect(1:10);
julia> setdiff(y, x)
7-element Vector{Int64}:
2
4
6
7
8
9
10
性能方面,基于循环的实现更好,因为我们可以考虑到新索引是原始索引的子集
function mysetdiff(y, x)
res = Vector{eltype(y)}(undef, length(y) - length(x))
i = 1
@inbounds for el in y
el ∈ x && continue
res[i] = el
i += 1
end
res
end
与比较
using BenchmarkTools
@btime [z for z ∈ $y if z ∉ $x]
# 141.893 ns (4 allocations: 288 bytes)
@btime setdiff($y, $x)
# 477.056 ns (8 allocations: 688 bytes)
@btime mysetdiff($y, $x)
# 46.434 ns (1 allocation: 144 bytes)
如果你很在意性能你也可以考虑:
julia> @benchmark deleteat!([1:10;], $i) # indices must be unique and sorted
BenchmarkTools.Trial:
memory estimate: 160 bytes
allocs estimate: 1
--------------
minimum time: 53.798 ns (0.00% GC)
median time: 60.790 ns (0.00% GC)
mean time: 71.125 ns (1.76% GC)
maximum time: 618.946 ns (77.28% GC)
--------------
samples: 10000
evals/sample: 987
和
julia> @benchmark (x = trues(10); x[$i] .= false; findall(x)) # if Bool-array is enough for you you can skip the last step to save 50% of time
BenchmarkTools.Trial:
memory estimate: 272 bytes
allocs estimate: 3
--------------
minimum time: 124.863 ns (0.00% GC)
median time: 134.033 ns (0.00% GC)
mean time: 157.668 ns (6.84% GC)
maximum time: 3.000 μs (95.44% GC)
--------------
samples: 10000
evals/sample: 905
与之前提出的相比:
julia> @benchmark ic = [x for x ∈ 1:10 if x ∉ $i]
BenchmarkTools.Trial:
memory estimate: 288 bytes
allocs estimate: 4
--------------
minimum time: 170.714 ns (0.00% GC)
median time: 196.571 ns (0.00% GC)
mean time: 222.510 ns (4.43% GC)
maximum time: 3.078 μs (91.01% GC)
--------------
samples: 10000
evals/sample: 700
和
julia> @benchmark setdiff($[1:10;], $i)
BenchmarkTools.Trial:
memory estimate: 672 bytes
allocs estimate: 7
--------------
minimum time: 504.145 ns (0.00% GC)
median time: 514.508 ns (0.00% GC)
mean time: 589.584 ns (2.75% GC)
maximum time: 8.954 μs (90.69% GC)
--------------
samples: 10000
evals/sample: 193
和(来自另一个 post 的自定义实现)
julia> @benchmark mysetdiff($[1:10;], $i)
BenchmarkTools.Trial:
memory estimate: 144 bytes
allocs estimate: 1
--------------
minimum time: 44.748 ns (0.00% GC)
median time: 46.869 ns (0.00% GC)
mean time: 52.780 ns (1.75% GC)
maximum time: 431.919 ns (88.49% GC)
--------------
samples: 10000
evals/sample: 990
既然你已经使用了过滤,那么反过来也一样。或者,如果您拥有用于获取 i
的函数,则可以将其取反以获得索引的补集。
julia> i = [1,3,5];
julia> filter(x -> x ∉ i, 1:10)
7-element Array{Int64,1}:
2
4
6
7
8
9
10
基准
@benchmark filter(x -> x ∉ i, 1:10)
BenchmarkTools.Trial:
memory estimate: 304 bytes
allocs estimate: 2
--------------
minimum time: 501.036 ns (0.00% GC)
median time: 511.399 ns (0.00% GC)
mean time: 546.609 ns (1.26% GC)
maximum time: 7.516 μs (92.31% GC)
--------------
samples: 10000
evals/sample: 193
参考方案
@benchmark ic = [x for x ∈ 1:10 if x ∉ i]
BenchmarkTools.Trial:
memory estimate: 288 bytes
allocs estimate: 4
--------------
minimum time: 626.036 ns (0.00% GC)
median time: 646.154 ns (0.00% GC)
mean time: 692.179 ns (2.72% GC)
maximum time: 26.065 μs (95.43% GC)
--------------
samples: 10000
evals/sample: 169
要从 InvertedIndices
包(通常它也与 DataFrames
一起加载)中添加到列表中的另一个值得了解的语法(虽然不是太快):
(1:10)[Not(i)]
julia> @benchmark (1:10)[Not($i)]
BenchmarkTools.Trial:
memory estimate: 1.42 KiB
allocs estimate: 39
--------------
minimum time: 1.130 μs (0.00% GC)
median time: 1.320 μs (0.00% GC)
mean time: 1.713 μs (5.05% GC)
maximum time: 324.000 μs (98.81% GC)
--------------
samples: 10000
evals/sample: 10