Julia DataFrame 多值过滤
Julia DataFrame multiple values filtering
以下情况有两种过滤DataFrame的方法:
1. df = df[((df[:field].==1) | (df[:field].==2)), :]
2. df = df[[in(v, [1, 2]) for v in df[:field]], :]
第二种方法速度较慢,但适用于条件中的可变值集。
有没有我错过的语法糖,所以我可以像第一种方式一样快地得到它,但有一些 in
-like 结构?
julia> using DataFrames
findin
function could be another way to do the task:
julia> function t_findin(df::DataFrames.DataFrame)
df[findin(df[:A],[1,2]), :]
end
t3 (generic function with 1 method)
array comprehensions:
julia> function t_compr(df::DataFrames.DataFrame)
df[[in(v, [1, 2]) for v in df[:A]], :]
end
t1 (generic function with 1 method)
multiple conditionds:
julia> function t_mconds(df::DataFrames.DataFrame)
df[((df[:A].==1) | (df[:A].==2)), :]
end
t2 (generic function with 1 method)
测试数据
julia> df[:B] = rand(1:30,10_000_000);
julia> df[:A] = rand(1:30,10_000_000);
测试结果
julia> @time t_findin(df);
0.489064 seconds (67 allocations: 19.340 MB, 0.49% gc time)
julia> @time t_mconds(df);
0.222389 seconds (106 allocations: 78.933 MB, 5.98% gc time)
julia> @time t_compr(df);
23.634846 seconds (100.00 M allocations: 2.563 GB, 1.47% gc time)
以下情况有两种过滤DataFrame的方法:
1. df = df[((df[:field].==1) | (df[:field].==2)), :]
2. df = df[[in(v, [1, 2]) for v in df[:field]], :]
第二种方法速度较慢,但适用于条件中的可变值集。
有没有我错过的语法糖,所以我可以像第一种方式一样快地得到它,但有一些 in
-like 结构?
julia> using DataFrames
findin
function could be another way to do the task:
julia> function t_findin(df::DataFrames.DataFrame)
df[findin(df[:A],[1,2]), :]
end
t3 (generic function with 1 method)
array comprehensions:
julia> function t_compr(df::DataFrames.DataFrame)
df[[in(v, [1, 2]) for v in df[:A]], :]
end
t1 (generic function with 1 method)
multiple conditionds:
julia> function t_mconds(df::DataFrames.DataFrame)
df[((df[:A].==1) | (df[:A].==2)), :]
end
t2 (generic function with 1 method)
测试数据
julia> df[:B] = rand(1:30,10_000_000);
julia> df[:A] = rand(1:30,10_000_000);
测试结果
julia> @time t_findin(df);
0.489064 seconds (67 allocations: 19.340 MB, 0.49% gc time)
julia> @time t_mconds(df);
0.222389 seconds (106 allocations: 78.933 MB, 5.98% gc time)
julia> @time t_compr(df);
23.634846 seconds (100.00 M allocations: 2.563 GB, 1.47% gc time)