如何将 Python pandas 转换为 Julia DataFrame(使用 PyJulia)并返回 Python Pandas

How to convert a Python pandas into a Julia DataFrame (using PyJulia) and back to Python Pandas

我想使用 PyJulia 来加速部分代码

import numpy as np
import julia
import pandas as pd
import random
from julia import Base
from julia import Main
from julia import DataFrames

n = 100000
randomlist = []
for i in range(0,n):
    num = random.randint(1,100)
    randomlist.append(num)

data = {
    'Score': list(randomlist),
        'ScoreBin': list(np.zeros(n))
           }
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.dfj = df

Main.eval(""" 
for i = 1:10
    #println(i)
    if dfj.Score[i] >= 10
        println(dfj.Score[i])
    end
end
"""
)

但是我收到以下错误消息:

JuliaError: Exception 'TypeError: non-boolean (PyObject) used in boolean context' occurred while calling julia code:

此外还有以下命令:

Main.eval(""" 
println(dfj.Score[1])
"""
)

给出输出(看起来不是 Julia DataFrame):

PyObject 84

有没有办法将 pandas DataFrame 转换为 Julia DataFrame?

编辑 1

感谢@PrzemyslawSzufel 的回答,下面的代码现在可以工作了:

import numpy as np
import julia
import pandas as pd
import random
import copy
from julia import Base
from julia import Main
from julia import DataFrames
from julia import Pandas
#julia.install(DataFrame)
%load_ext julia.magic

n = 100000
randomlist = []
for i in range(0,n):
    num = random.randint(1,100)
    randomlist.append(num)

data = {
    'Score': list(randomlist),
        'ScoreBin': list(np.zeros(n))
           }
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.df = df;

Main.eval("""
dfj = df |> Pandas.DataFrame|> DataFrames.DataFrame;
""")

然而,尽管我在行尾放了一个 ;,但我总是从 dfj 得到一个不需要的长打印输出(100000 行)并且需要大约一秒钟。有没有办法避免打印输出?

此外,如果我现在在 Julia 中修改数据框(这比在 python 中这样做要快得多,也是整个问题的目标)并希望它把它转换回 python pandas,我也报错

Main.eval(""" 
for i = 1:length(dfj[:, :Score])
    if dfj[i, :Score] > 50
        dfj[i, :ScoreBin] = 1 
    end
end
"""
)

dfjpy = pd.DataFrame(Main.dfj)
dfjpy


RuntimeError: Julia exception: MethodError: no method matching iterate(::DataFrames.DataFrame)
Closest candidates are:
  iterate(!Matched::Core.SimpleVector) at essentials.jl:568
  iterate(!Matched::Core.SimpleVector, !Matched::Any) at essentials.jl:568
  iterate(!Matched::ExponentialBackOff) at error.jl:199
  ...
Stacktrace:
 [1] jlwrap_iterator(::DataFrames.DataFrame) at /Users/mymac/.julia/packages/PyCall/zqDXB/src/pyiterator.jl:144
 [2] pyjlwrap_getiter(::Ptr{PyCall.PyObject_struct}) at /Users/mymac/.julia/packages/PyCall/zqDXB/src/pyiterator.jl:125

顺便说一句,命令 type(dfjpy) 输出 PyCall.jlwrap

编辑 2

为了将 julia Dataframe 转换为 Python Pandas,您必须先将其转换为 Julia Pandas。是最新的工作代码

n = 100000
randomlist = []
for i in range(0,n):
    num = random.randint(1,100)
    randomlist.append(num)

data = {
    'Score': list(randomlist),
        'ScoreBin': list(np.zeros(n))
           }
df = pd.DataFrame(data, columns = ['Score', 'ScoreBin'])
Main.df = df;

Main.eval("""
dfj = df |> Pandas.DataFrame|> DataFrames.DataFrame;

for i = 1:length(dfj[:, :Score])
    if dfj[i, :Score] > 50
        dfj[i, :ScoreBin] = 1 
    end
end

dfjp = dfj |> Pandas.DataFrame;
"""
)

dfjpy = Main.dfjp
dfjpy

您需要安装 Pandas.jl。该库将使用 Julia 处理您的 Python pandas 数据框,然后您可以将其转换为 DataFrames.jl.

这是 Julia 代码(假设 dfj 是您的 Python 变量):

import DataFrames
import Pandas
juliandf = dfj |> Pandas.DataFrame |> DataFrames.DataFrame;

注意最后一行也可以写成:

C= DataFrames.DataFrame(Pandas.DataFrame(dfj));

转换回 Pandas.DataFrame(juliandf) 应该可行。