Python 问题中的多处理
Multiprocessing in Python Issue
我正在 python 中尝试多处理,但似乎无法正常工作。
输入文件如下:
而代码如下:
import pandas as pd
import multiprocessing
import time
import datetime
start_time = datetime.datetime.now()
df_main = []
df_main = pd.read_csv("data.csv")
df_file = []
def growth_calculator(Type):
values = [Type]
global df_temp, df_file
df_temp = df_main[df_main.Type.isin(values)]
df_temp = df_temp[['Company', 'Type']]
print(df_temp)
time.sleep(10)
if __name__ == '__main__':
multiprocessing.Process(target=growth_calculator('Quarterly'))
multiprocessing.Process(target=growth_calculator('Annual'))
multiprocessing.Process(target=growth_calculator('Monthly'))
end_time = datetime.datetime.now()
print("Time Taken -", end_time-start_time)
输出应该需要大约 10-11 秒,但实际需要 30 秒。所以,很明显,多处理不起作用。
你能给我指明正确的方向吗?
提前致谢!
您需要将目标参数作为进程初始化的 args=
关键字传递(参见 https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process)。否则你的函数在实例化过程之前被评估,这会导致 single-process 性能。
像这样:
import pandas as pd
import multiprocessing
import time
import datetime
start_time = datetime.datetime.now()
def growth_calculator(Type):
print(Type)
time.sleep(10)
if __name__ == '__main__':
p1 = multiprocessing.Process(target=growth_calculator,args=('Quarterly',))
p2 = multiprocessing.Process(target=growth_calculator,args=('Annual',))
p3 = multiprocessing.Process(target=growth_calculator,args=('Monthly',))
p1.start()
p2.start()
p3.start()
print('started')
p1.join()
p2.join()
p3.join()
end_time = datetime.datetime.now()
print("Time Taken -", end_time-start_time)
我正在 python 中尝试多处理,但似乎无法正常工作。
输入文件如下:
而代码如下:
import pandas as pd
import multiprocessing
import time
import datetime
start_time = datetime.datetime.now()
df_main = []
df_main = pd.read_csv("data.csv")
df_file = []
def growth_calculator(Type):
values = [Type]
global df_temp, df_file
df_temp = df_main[df_main.Type.isin(values)]
df_temp = df_temp[['Company', 'Type']]
print(df_temp)
time.sleep(10)
if __name__ == '__main__':
multiprocessing.Process(target=growth_calculator('Quarterly'))
multiprocessing.Process(target=growth_calculator('Annual'))
multiprocessing.Process(target=growth_calculator('Monthly'))
end_time = datetime.datetime.now()
print("Time Taken -", end_time-start_time)
输出应该需要大约 10-11 秒,但实际需要 30 秒。所以,很明显,多处理不起作用。
你能给我指明正确的方向吗?
提前致谢!
您需要将目标参数作为进程初始化的 args=
关键字传递(参见 https://docs.python.org/3/library/multiprocessing.html#multiprocessing.Process)。否则你的函数在实例化过程之前被评估,这会导致 single-process 性能。
像这样:
import pandas as pd
import multiprocessing
import time
import datetime
start_time = datetime.datetime.now()
def growth_calculator(Type):
print(Type)
time.sleep(10)
if __name__ == '__main__':
p1 = multiprocessing.Process(target=growth_calculator,args=('Quarterly',))
p2 = multiprocessing.Process(target=growth_calculator,args=('Annual',))
p3 = multiprocessing.Process(target=growth_calculator,args=('Monthly',))
p1.start()
p2.start()
p3.start()
print('started')
p1.join()
p2.join()
p3.join()
end_time = datetime.datetime.now()
print("Time Taken -", end_time-start_time)