Enum.filter 不可扩展?

Enum.filter not scalable?

我解码一个 CSV 文件(使用 https://hexdocs.pm/csv/),生成一个流,然后我用 Enum.filter 过滤这个流。我的问题是处理时间不会随着 CSV 文件的大小线性增长:

% wc -l long.csv 
10000 long.csv
% time mix run testcvs.exs long.csv  
mix run testcvs.exs long.csv  3.08s user 0.50s system 242% cpu 1.479 total

% wc -l verylong.csv
100000 verylong.csv
% time mix run testcvs.exs verylong.csv 
mix run testcvs.exs verylong.csv  98.08s user 3.24s system 117% cpu 1:25.93 total

应该多花十倍,实际多花57倍。绝对不可扩展。这是否意味着 Enum.filter 不使用流而是将所有内容加载到内存中?是否有更具可扩展性的方式来过滤流?

代码:

Enum.at(System.argv(), 0)
|> File.stream!([:read], :line)
|> CSV.decode([separator: ?;])
|> Enum.filter(fn {:ok, line} -> Enum.at(line, 11) == "" end)

Does it mean that Enum.filter does not use streaming but instead loads everything in memory?

是的。正如丹尼尔在评论中提到的,对于流,你应该使用 Stream.filter/2.

来自枚举的 docs

Note the functions in the Enum module are eager: they will traverse the enumerable as soon as they are invoked. This is particularly dangerous when working with infinite enumerables. In such cases, you should use the Stream module, which allows you to lazily express computations, without traversing collections, and work with possibly infinite collections. See the Stream module for examples and documentation.