如何在 Julia 数据框中 remove/drop 行什么都没有和 NaN？

Question

我有一个 df，其中包含 nothing、NaN 和 missing。要删除包含 missing 的行，我可以使用 dropmissing。 NaN和nothing有什么方法可以处理吗？

样本 df:

│ Row │ x       │ y    │
│     │ Union…? │ Char │
├─────┼─────────┼──────┤
│ 1   │ 1.0     │ 'a'  │
│ 2   │ missing │ 'b'  │
│ 3   │ 3.0     │ 'c'  │
│ 4   │         │ 'd'  │
│ 5   │ 5.0     │ 'e'  │
│ 6   │ NaN     │ 'f'  │

预期输出：

│ Row │ x   │ y    │
│     │ Any │ Char │
├─────┼─────┼──────┤
│ 1   │ 1.0 │ 'a'  │
│ 2   │ 3.0 │ 'c'  │
│ 3   │ 5.0 │ 'e'  │

到目前为止我已经尝试过，根据我对 Julia 的了解，我尝试了这个，

df.x = replace(df.x, NaN=>"something", missing=>"something", nothing=>"something")
print(df[df."x".!="something", :])

我的代码按预期运行。我觉得这是解决这个问题的无效方法。有没有单独的方法来处理 nothing 和 NaN？

Answer 1

你可以这样做这个：

julia> df = DataFrame(x=[1,missing,3,nothing,5,NaN], y='a':'f')
6×2 DataFrame
│ Row │ x       │ y    │
│     │ Union…? │ Char │
├─────┼─────────┼──────┤
│ 1   │ 1.0     │ 'a'  │
│ 2   │ missing │ 'b'  │
│ 3   │ 3.0     │ 'c'  │
│ 4   │         │ 'd'  │
│ 5   │ 5.0     │ 'e'  │
│ 6   │ NaN     │ 'f'  │

julia> filter(:x => x -> !any(f -> f(x), (ismissing, isnothing, isnan)), df)
3×2 DataFrame
│ Row │ x       │ y    │
│     │ Union…? │ Char │
├─────┼─────────┼──────┤
│ 1   │ 1.0     │ 'a'  │
│ 2   │ 3.0     │ 'c'  │
│ 3   │ 5.0     │ 'e'  │

请注意，这里检查的顺序很重要，因为 isnan 应该在最后，否则此检查将因 missing 或 nothing 值而失败。

你也可以更直接地写成：

julia> filter(:x => x -> !(ismissing(x) || isnothing(x) || isnan(x)), df)
3×2 DataFrame
│ Row │ x       │ y    │
│     │ Union…? │ Char │
├─────┼─────────┼──────┤
│ 1   │ 1.0     │ 'a'  │
│ 2   │ 3.0     │ 'c'  │
│ 3   │ 5.0     │ 'e'  │

但我觉得带有 any 的示例更具可扩展性（然后您可以存储谓词列表以检查变量）。

之所以在DataFrames.jl中只提供了删除missing的功能，是因为这通常被认为是有效的，但在数据科学管道中删除价值是可取的。

通常在 Julia 中，当您看到 nothing 或 NaN 时，您可能希望以不同于 missing 的方式处理它们，因为它们很可能表示数据中存在一些错误或正在处理数据（与 missing 相反，后者表示数据尚未收集）。

如何在 Julia 数据框中 remove/drop 行什么都没有和 NaN？

How to remove/drop rows of nothing and NaN in Julia dataframe?

julia

julia-dataframe