尝试从 Julia 中的 Arrow Dataframe 获取一行时出现 MethodError
MethodError when trying to get a row from an Arrow Dataframe in Julia
我有一个如下所示的数据集:
我正在获取一个 CSV 文件,将其转换为 Parquet,然后将其发送给 Arrow。我这样做是有原因的。我的目标是访问行 "Algeria"
中的信息。这是我的代码:
df = CSV.read("temp.csv", DataFrame)
write_parquet("data_file.parquet", df)
df = DataFrame(read_parquet("data_file.parquet"))
Arrow.write("data_file.arrow", df)
df = DataFrame(Arrow.Table("data_file.arrow"))
dates = names(df)[5:end]
countries = unique(df[:, :"Country/Region"])
algeria = df[df."Country/Region" .== "Algeria", 4:end]
# Print(sum(eachcol(algeria)))
Print(Statistics.mean(eachcol(algeria)))
但是最后一部分尝试从 Arrow 检索数据时抛出此错误:
MethodError: no method matching +(::Float64, ::String)
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:538
+(::Float64, !Matched::Float64) at float.jl:401
+(!Matched::ChainRulesCore.One, ::Any) at /home/onur/.julia/packages/ChainRulesCore/7d1hl/src/differential_arithmetic.jl:94
我做错了什么?
这是我在 REPL 中输入“阿尔及利亚”时得到的结果
更新: Gabriel 建议的实现:
begin
algeria = df[df."Country/Region" .== "Algeria", 4:end]
for i = 1:size(algeria, 2)
if eltype(algeria[!, i]) == String
algeria[!, i] = parse.(Float64, algeria[!, i])
end
end
Statistics.mean(eachcol(algeria))
end
这是错误:
MethodError: no method matching +(::Float64, ::String)
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:538
+(::Float64, !Matched::Float64) at float.jl:401
+(!Matched::ChainRulesCore.One, ::Any) at /home/onur/.julia/packages/ChainRulesCore/7d1hl/src/differential_arithmetic.jl:94
需要向量化mean
,请看下面的代码:
julia> df = DataFrame(a=1:3, b=1.5:1:3.5)
3×2 DataFrame
Row │ a b
│ Int64 Float64
─────┼────────────────
1 │ 1 1.5
2 │ 2 2.5
3 │ 3 3.5
julia> Statistics.mean.(eachcol(df))
2-element Vector{Float64}:
2.0
2.5
所以看起来 algeria
中的一列包含字符串而不是浮点数。
在计算平均值之前尝试这样做:
for i = 1:size(algeria, 2)
if eltype(algeria[!, i]) == String
algeria[!, i] = parse.(Float64, algeria[!, i])
end
end
我有一个如下所示的数据集:
我正在获取一个 CSV 文件,将其转换为 Parquet,然后将其发送给 Arrow。我这样做是有原因的。我的目标是访问行 "Algeria"
中的信息。这是我的代码:
df = CSV.read("temp.csv", DataFrame)
write_parquet("data_file.parquet", df)
df = DataFrame(read_parquet("data_file.parquet"))
Arrow.write("data_file.arrow", df)
df = DataFrame(Arrow.Table("data_file.arrow"))
dates = names(df)[5:end]
countries = unique(df[:, :"Country/Region"])
algeria = df[df."Country/Region" .== "Algeria", 4:end]
# Print(sum(eachcol(algeria)))
Print(Statistics.mean(eachcol(algeria)))
但是最后一部分尝试从 Arrow 检索数据时抛出此错误:
MethodError: no method matching +(::Float64, ::String)
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:538
+(::Float64, !Matched::Float64) at float.jl:401
+(!Matched::ChainRulesCore.One, ::Any) at /home/onur/.julia/packages/ChainRulesCore/7d1hl/src/differential_arithmetic.jl:94
我做错了什么?
这是我在 REPL 中输入“阿尔及利亚”时得到的结果
更新: Gabriel 建议的实现:
begin
algeria = df[df."Country/Region" .== "Algeria", 4:end]
for i = 1:size(algeria, 2)
if eltype(algeria[!, i]) == String
algeria[!, i] = parse.(Float64, algeria[!, i])
end
end
Statistics.mean(eachcol(algeria))
end
这是错误:
MethodError: no method matching +(::Float64, ::String)
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:538
+(::Float64, !Matched::Float64) at float.jl:401
+(!Matched::ChainRulesCore.One, ::Any) at /home/onur/.julia/packages/ChainRulesCore/7d1hl/src/differential_arithmetic.jl:94
需要向量化mean
,请看下面的代码:
julia> df = DataFrame(a=1:3, b=1.5:1:3.5)
3×2 DataFrame
Row │ a b
│ Int64 Float64
─────┼────────────────
1 │ 1 1.5
2 │ 2 2.5
3 │ 3 3.5
julia> Statistics.mean.(eachcol(df))
2-element Vector{Float64}:
2.0
2.5
所以看起来 algeria
中的一列包含字符串而不是浮点数。
在计算平均值之前尝试这样做:
for i = 1:size(algeria, 2)
if eltype(algeria[!, i]) == String
algeria[!, i] = parse.(Float64, algeria[!, i])
end
end