在 polars rust 中使用 groupby 时如何避免深度复制?
How to avoid deep copy when using groupby in polars rust?
我有一个数据集,我需要对不同的列进行 groupby 操作。这是使用 polars 版本“0.21.1”
的最小工作代码
use polars::prelude::*;
use polars_lazy::prelude::*;
use polars::df;
fn main(){
let df = df![
"x1" => ["a", "b", "c", "a"],
"x2" => ["A", "A", "B", "B"],
"y" => [1, 2, 3, 4],
].unwrap();
let lf: LazyFrame = df.lazy();
let out1 = groupby_x1(&lf);
println!("{:?}", out1.collect());
let out2 = groupby_x2(&lf);
println!("{:?}", out2.collect());
}
fn groupby_x1(lf: &LazyFrame) -> LazyFrame {
let lf1: LazyFrame = lf.clone().groupby([col("x1")]).agg([
col("y").sum().alias("y_sum"),
]);
lf1
}
fn groupby_x2(lf: &LazyFrame) -> LazyFrame {
let lf1: LazyFrame = lf.clone().groupby([col("x2")]).agg([
col("y").sum().alias("y_sum"),
]);
lf1
}
但在代码中,我正在对整个惰性框架 lf
进行深层复制(使用 lf.clone()
。我该如何避免这种情况?如果我将 lf.clone()
替换为 lf
在函数 groupby_x1
和 groupby_x2
中我得到以下错误
error[E0507]: cannot move out of `*lf` which is behind a shared reference
--> src/main.rs:22:24
|
22 | let lf1: LazyFrame = lf.groupby([col("x1")]).agg([
| ^^^^^^^^^^^^^^^^^^^^^^^ move occurs because `*lf` has type `polars_lazy::frame::LazyFrame`, which does not implement the `Copy` trait
error[E0507]: cannot move out of `*lf` which is behind a shared reference
--> src/main.rs:29:24
|
29 | let lf1: LazyFrame = lf.groupby([col("x2")]).agg([
| ^^^^^^^^^^^^^^^^^^^^^^^ move occurs because `*lf` has type `polars_lazy::frame::LazyFrame`, which does not implement the `Copy` trait
For more information about this error, try `rustc --explain E0507`.
error: could not compile `polars_try` due to 2 previous errors
来自documentation、LazyFrame
:
Lazy abstraction over an eager DataFrame. It really is an abstraction
over a logical plan. The methods of this struct will incrementally
modify a logical plan until output is requested (via collect)
意思是没有Dataframe的深拷贝,在你真正收集它之前什么都不做。
因此你有两个选择:
- 如果你想保持原计划不变,你就继续复制它们
- 您拥有计划的所有权
groupby_x1(lf: LazyFrame)
,并让函数的用户处理实际克隆原始计划的需要(如果需要)。
极地 Series
是 Arc<Vec<ArrowRef>>
附近的新类型。当您克隆 DataFrame
时,只会增加 Arc
的引用计数。
换句话说,polars从不做深度克隆。 DataFrame
的克隆非常便宜。
我有一个数据集,我需要对不同的列进行 groupby 操作。这是使用 polars 版本“0.21.1”
的最小工作代码use polars::prelude::*;
use polars_lazy::prelude::*;
use polars::df;
fn main(){
let df = df![
"x1" => ["a", "b", "c", "a"],
"x2" => ["A", "A", "B", "B"],
"y" => [1, 2, 3, 4],
].unwrap();
let lf: LazyFrame = df.lazy();
let out1 = groupby_x1(&lf);
println!("{:?}", out1.collect());
let out2 = groupby_x2(&lf);
println!("{:?}", out2.collect());
}
fn groupby_x1(lf: &LazyFrame) -> LazyFrame {
let lf1: LazyFrame = lf.clone().groupby([col("x1")]).agg([
col("y").sum().alias("y_sum"),
]);
lf1
}
fn groupby_x2(lf: &LazyFrame) -> LazyFrame {
let lf1: LazyFrame = lf.clone().groupby([col("x2")]).agg([
col("y").sum().alias("y_sum"),
]);
lf1
}
但在代码中,我正在对整个惰性框架 lf
进行深层复制(使用 lf.clone()
。我该如何避免这种情况?如果我将 lf.clone()
替换为 lf
在函数 groupby_x1
和 groupby_x2
中我得到以下错误
error[E0507]: cannot move out of `*lf` which is behind a shared reference
--> src/main.rs:22:24
|
22 | let lf1: LazyFrame = lf.groupby([col("x1")]).agg([
| ^^^^^^^^^^^^^^^^^^^^^^^ move occurs because `*lf` has type `polars_lazy::frame::LazyFrame`, which does not implement the `Copy` trait
error[E0507]: cannot move out of `*lf` which is behind a shared reference
--> src/main.rs:29:24
|
29 | let lf1: LazyFrame = lf.groupby([col("x2")]).agg([
| ^^^^^^^^^^^^^^^^^^^^^^^ move occurs because `*lf` has type `polars_lazy::frame::LazyFrame`, which does not implement the `Copy` trait
For more information about this error, try `rustc --explain E0507`.
error: could not compile `polars_try` due to 2 previous errors
来自documentation、LazyFrame
:
Lazy abstraction over an eager DataFrame. It really is an abstraction over a logical plan. The methods of this struct will incrementally modify a logical plan until output is requested (via collect)
意思是没有Dataframe的深拷贝,在你真正收集它之前什么都不做。
因此你有两个选择:
- 如果你想保持原计划不变,你就继续复制它们
- 您拥有计划的所有权
groupby_x1(lf: LazyFrame)
,并让函数的用户处理实际克隆原始计划的需要(如果需要)。
极地 Series
是 Arc<Vec<ArrowRef>>
附近的新类型。当您克隆 DataFrame
时,只会增加 Arc
的引用计数。
换句话说,polars从不做深度克隆。 DataFrame
的克隆非常便宜。