为什么 pandas 数据帧在我将其用作具有多处理功能的函数的输入时没有改变
Why pandas dataframe doesn't change when i used it as a input of a function with multiprocessing
我有这样的代码:
df1 = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
}
)
df2 = pd.DataFrame(
{
"A": ["A4", "A5", "A6", "A7"],
"B": ["B4", "B5", "B6", "B7"],
"C": ["C4", "C5", "C6", "C7"],
"D": ["D4", "D5", "D6", "D7"],
}
)
def changeDF(df):
df['Signal'] = 0
changeDF(df1)
changeDF(df2)
当我在上面 运行 时,(changeDf) 函数向 df1 和 df2 添加一个名为 'Signal' 且值为 0 的列。但不是像下面那样直接使用多处理 运行 (changeDf),它不会改变任何 dfs。
s = [df1, df2]
with multiprocessing.Pool(processes=2) as pool:
res = pool.map(changeDF, s)
我的代码有什么问题?
为多重处理序列化 df1
& df2
意味着您正在制作副本。
Return 函数中的数据框,它会正常工作。
def changeDF(df):
df['Signal'] = 0
return(df)
with multiprocessing.Pool(processes=2) as pool:
df1, df2 = pool.map(changeDF, [df1, df2])
我要警告你,这样做的序列化成本肯定会高于你从多处理中获得的收益。
将你的函数changeDF
改成这样:
def changeDF(df):
df['Signal'] = 0
return df
我有这样的代码:
df1 = pd.DataFrame(
{
"A": ["A0", "A1", "A2", "A3"],
"B": ["B0", "B1", "B2", "B3"],
"C": ["C0", "C1", "C2", "C3"],
"D": ["D0", "D1", "D2", "D3"],
}
)
df2 = pd.DataFrame(
{
"A": ["A4", "A5", "A6", "A7"],
"B": ["B4", "B5", "B6", "B7"],
"C": ["C4", "C5", "C6", "C7"],
"D": ["D4", "D5", "D6", "D7"],
}
)
def changeDF(df):
df['Signal'] = 0
changeDF(df1)
changeDF(df2)
当我在上面 运行 时,(changeDf) 函数向 df1 和 df2 添加一个名为 'Signal' 且值为 0 的列。但不是像下面那样直接使用多处理 运行 (changeDf),它不会改变任何 dfs。
s = [df1, df2]
with multiprocessing.Pool(processes=2) as pool:
res = pool.map(changeDF, s)
我的代码有什么问题?
为多重处理序列化 df1
& df2
意味着您正在制作副本。
Return 函数中的数据框,它会正常工作。
def changeDF(df):
df['Signal'] = 0
return(df)
with multiprocessing.Pool(processes=2) as pool:
df1, df2 = pool.map(changeDF, [df1, df2])
我要警告你,这样做的序列化成本肯定会高于你从多处理中获得的收益。
将你的函数changeDF
改成这样:
def changeDF(df):
df['Signal'] = 0
return df