如何在 Julia 中将列添加到空 DataFrame

How to add a Column to an empty DataFrame in Julia

我想将向量作为列附加到空 DataFrame。假设我像这样定义了一个空的 DataFrame

import DataFrames
dataframe = DataFrames.DataFrame()

然后我想将此向量作为列附加到 dataframe:

vec = [1,2,3]

我尝试了 push!(dataframe , vec),但出现了这个错误:

DimensionMismatch("Length of `row` does not match `DataFrame` column count.")

Stacktrace:
  [1] push!(df::DataFrames.DataFrame, row::Vector{Int64}; promote::Bool)
    @ DataFrames C:\Users\Shayan\.julia\packages\DataFrames\BM4OQ\src\dataframe\dataframe.jl:1691
  [2] push!(df::DataFrames.DataFrame, row::Vector{Int64})
    @ DataFrames C:\Users\Shayan\.julia\packages\DataFrames\BM4OQ\src\dataframe\dataframe.jl:1680
  [3] top-level scope
    @ c:\Users\Shayan\Documents\PyJul Scripts\Jul-test.ipynb:2
  [4] eval
    @ .\boot.jl:373 [inlined]
  [5] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base .\loading.jl:1196
  [6] #invokelatest#2
    @ .\essentials.jl:716 [inlined]
  [7] invokelatest
    @ .\essentials.jl:714 [inlined]
  [8] (::VSCodeServer.var"#164#165"{VSCodeServer.NotebookRunCellArguments, String})()
    @ VSCodeServer c:\Users\Shayan\.vscode\extensions\julialang.language-julia-1.6.17\scripts\packages\VSCodeServer\src\serve_notebook.jl:19
  [9] withpath(f::VSCodeServer.var"#164#165"{VSCodeServer.NotebookRunCellArguments, String}, path::String)
    @ VSCodeServer c:\Users\Shayan\.vscode\extensions\julialang.language-julia-1.6.17\scripts\packages\VSCodeServer\src\repl.jl:184
 [10] notebook_runcell_request(conn::VSCodeServer.JSONRPC.JSONRPCEndpoint{Base.PipeEndpoint, Base.PipeEndpoint}, params::VSCodeServer.NotebookRunCellArguments)
    @ VSCodeServer c:\Users\Shayan\.vscode\extensions\julialang.language-julia-1.6.17\scripts\packages\VSCodeServer\src\serve_notebook.jl:13
 [11] dispatch_msg(x::VSCodeServer.JSONRPC.JSONRPCEndpoint{Base.PipeEndpoint, Base.PipeEndpoint}, dispatcher::VSCodeServer.JSONRPC.MsgDispatcher, msg::Dict{String, Any})
    @ VSCodeServer.JSONRPC c:\Users\Shayan\.vscode\extensions\julialang.language-julia-1.6.17\scripts\packages\JSONRPC\src\typed.jl:67
 [12] serve_notebook(pipename::String, outputchannel_logger::Base.CoreLogging.SimpleLogger; crashreporting_pipename::String)
    @ VSCodeServer c:\Users\Shayan\.vscode\extensions\julialang.language-julia-1.6.17\scripts\packages\VSCodeServer\src\serve_notebook.jl:136
 [13] top-level scope
    @ c:\Users\Shayan\.vscode\extensions\julialang.language-julia-1.6.17\scripts\notebook\notebook.jl:32
 [14] include(mod::Module, _path::String)
    @ Base .\Base.jl:418
 [15] exec_options(opts::Base.JLOptions)
    @ Base .\client.jl:292
 [16] _start()
    @ Base .\client.jl:495

此外,我尝试了 insert!(dataframe , vec),但我得到了这个:

MethodError: no method matching insert!(::DataFrames.DataFrame, ::Vector{Int64})
Closest candidates are:
  insert!(!Matched::DataStructures.AVLTree{K}, ::K) where K at C:\Users\Shayan\.julia\packages\DataStructures\vSp4s\src\avl_tree.jl:128
  insert!(!Matched::DataStructures.SortedSet, ::Any) at C:\Users\Shayan\.julia\packages\DataStructures\vSp4s\src\sorted_set.jl:114
  insert!(!Matched::DataStructures.SortedDict{K, D, Ord}, ::Any, !Matched::Any) where {K, D, Ord<:Base.Order.Ordering} at C:\Users\Shayan\.julia\packages\DataStructures\vSp4s\src\sorted_dict.jl:268

我该怎么做?任何帮助将不胜感激。

补充说明: vec 未在 dataframe 之前定义,并且是有意的!我的意思是,我必须先创建一个空的 DataFrame!

您可以进行如下操作:

julia> r=DataFrame(:a=>rand(5),:b=>rand(5))
5×2 DataFrame
 Row │ a         b        
     │ Float64   Float64  
─────┼────────────────────
   1 │ 0.8613    0.207534
   2 │ 0.994096  0.561571
   3 │ 0.220975  0.429286
   4 │ 0.884805  0.835078
   5 │ 0.964035  0.653509

julia> r[:,:c]=rand(5)
5-element Vector{Float64}:
 0.5722614445699863
 0.1582911302051686
 0.14114436033460553
 0.20981872218154363
 0.07636493031324465

julia> r
5×3 DataFrame
 Row │ a         b         c         
     │ Float64   Float64   Float64   
─────┼───────────────────────────────
   1 │ 0.8613    0.207534  0.572261
   2 │ 0.994096  0.561571  0.158291
   3 │ 0.220975  0.429286  0.141144
   4 │ 0.884805  0.835078  0.209819
   5 │ 0.964035  0.653509  0.0763649

nb:也可以从空数据帧开始工作:

julia> r=DataFrame()
0×0 DataFrame

julia> r[:,:c]=rand(5)
5-element Vector{Float64}:
 0.6792303081607677
 0.08094072339097869
 0.5171831771259873
 0.35343166177619845
 0.44751700973394026

julia> r
5×1 DataFrame
 Row │ c         
     │ Float64   
─────┼───────────
   1 │ 0.67923
   2 │ 0.0809407
   3 │ 0.517183
   4 │ 0.353432
   5 │ 0.447517

Update & summary (completed using Bogumił Kamiński answer)

You can do:

d[:,:colname] = x_vector # copy of x
d[!,:colname] = x_vector # no copy of x (shared)

if x is a scalar, see Bogumił Kamiński answer.

根据您的需要,有以下选项。

  1. 添加矢量而不复制
julia> x = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> df = DataFrame()
0×0 DataFrame

julia> df.x = x
3-element Vector{Int64}:
 1
 2
 3

julia> df.x === x
true

julia> x = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> df = DataFrame()
0×0 DataFrame

julia> df[!, :x] = x
3-element Vector{Int64}:
 1
 2
 3

julia> df.x === x
true
  1. 通过复制添加矢量
julia> x = [1, 2, 3]
3-element Vector{Int64}:
 1
 2
 3

julia> df = DataFrame()
0×0 DataFrame

julia> df[:, :x] = x
3-element Vector{Int64}:
 1
 2
 3

julia> df.x == x
true

julia> df.x === x
false
  1. 如果你有标量你可以做(​​也适用于矢量)
julia> df = DataFrame()
0×0 DataFrame

julia> insertcols!(df, :x => 1)
1×1 DataFrame
 Row │ x
     │ Int64
─────┼───────
   1 │     1