在 polars rust 中使用 groupby 时如何避免深度复制?

How to avoid deep copy when using groupby in polars rust?

我有一个数据集,我需要对不同的列进行 groupby 操作。这是使用 polars 版本“0.21.1”

的最小工作代码
use polars::prelude::*;
use polars_lazy::prelude::*;
use polars::df;

fn main(){
  let df = df![
    "x1" => ["a", "b", "c", "a"],
    "x2" => ["A", "A", "B", "B"],
    "y" => [1, 2, 3, 4],
    ].unwrap();

  let lf: LazyFrame = df.lazy();

  let out1 = groupby_x1(&lf);
  println!("{:?}", out1.collect());
  let out2 = groupby_x2(&lf);
  println!("{:?}", out2.collect());

}

fn  groupby_x1(lf: &LazyFrame) -> LazyFrame {
  let lf1: LazyFrame = lf.clone().groupby([col("x1")]).agg([
    col("y").sum().alias("y_sum"),
  ]);
  lf1
}

fn  groupby_x2(lf: &LazyFrame) -> LazyFrame {
  let lf1: LazyFrame = lf.clone().groupby([col("x2")]).agg([
    col("y").sum().alias("y_sum"),
  ]);
  lf1
}

但在代码中,我正在对整个惰性框架 lf 进行深层复制(使用 lf.clone()。我该如何避免这种情况?如果我将 lf.clone() 替换为 lf在函数 groupby_x1groupby_x2 中我得到以下错误

error[E0507]: cannot move out of `*lf` which is behind a shared reference
  --> src/main.rs:22:24
   |
22 |   let lf1: LazyFrame = lf.groupby([col("x1")]).agg([
   |                        ^^^^^^^^^^^^^^^^^^^^^^^ move occurs because `*lf` has type `polars_lazy::frame::LazyFrame`, which does not implement the `Copy` trait

error[E0507]: cannot move out of `*lf` which is behind a shared reference
  --> src/main.rs:29:24
   |
29 |   let lf1: LazyFrame = lf.groupby([col("x2")]).agg([
   |                        ^^^^^^^^^^^^^^^^^^^^^^^ move occurs because `*lf` has type `polars_lazy::frame::LazyFrame`, which does not implement the `Copy` trait

For more information about this error, try `rustc --explain E0507`.
error: could not compile `polars_try` due to 2 previous errors

来自documentationLazyFrame:

Lazy abstraction over an eager DataFrame. It really is an abstraction over a logical plan. The methods of this struct will incrementally modify a logical plan until output is requested (via collect)

意思是没有Dataframe的深拷贝,在你真正收集它之前什么都不做。

因此你有两个选择:

  1. 如果你想保持原计划不变,你就继续复制它们
  2. 您拥有计划的所有权 groupby_x1(lf: LazyFrame),并让函数的用户处理实际克隆原始计划的需要(如果需要)。

极地 SeriesArc<Vec<ArrowRef>> 附近的新类型。当您克隆 DataFrame 时,只会增加 Arc 的引用计数。

换句话说,polars从不做深度克隆。 DataFrame 的克隆非常便宜。