如何将 multiprocess.Pool 与 .apply 函数一起使用?
How to use multiprocess.Pool with .apply function?
我有一个处理字符串的函数,我正在将它应用于数据框列
import pandas as pd
import numpy as np
def test_upper(d):
return d.upper()
def mainfunc():
df = pd.read_csv("file.csv", sep='\t', encoding='utf-8')
print(df.head())
lambdafunc = lambda x: test_upper(x)
df['upper_cols'] = df['cols'].apply(lambdafunc)
print(df.head())
mainfunc()
现在,我想用 multiprocessing.Pool 做同样的事情。我已经在 Whosebug 中搜索了如何执行此操作,这就是我想出的:
import pandas as pd
import numpy as np
import multiprocessing as mp
def test_upper(d):
return d.upper()
def mainfunc():
df = pd.read_csv("file.csv", sep='\t', encoding='utf-8')
print(df.head())
lambdafunc = lambda x: test_upper(x)
list_results = pd.Series()
def log_result(result):
list_results.append(result)
pool = mp.Pool(processes=4)
pool.apply_async(lambdafunc, (df['cols'], ), callback=log_result)
pool.close()
pool.join()
print(list_results)
mainfunc()
结果是空白 Series/list 因为我都试过了。我在这里做错了什么?
谢谢!
终于想通了
def test_upper(d):
output = d.apply(lambda x: x:upper())
return output
def mainfunc():
df = pd.read_csv("file.csv", sep='\t', encoding='utf-8')
print(df.head())
pool = mp.Pool(processes=4)
result = pool.apply_async(test_upper, (df['cols'], ))
pool.close()
pool.join()
print(result.get())
mainfunc()
我有一个处理字符串的函数,我正在将它应用于数据框列
import pandas as pd
import numpy as np
def test_upper(d):
return d.upper()
def mainfunc():
df = pd.read_csv("file.csv", sep='\t', encoding='utf-8')
print(df.head())
lambdafunc = lambda x: test_upper(x)
df['upper_cols'] = df['cols'].apply(lambdafunc)
print(df.head())
mainfunc()
现在,我想用 multiprocessing.Pool 做同样的事情。我已经在 Whosebug 中搜索了如何执行此操作,这就是我想出的:
import pandas as pd
import numpy as np
import multiprocessing as mp
def test_upper(d):
return d.upper()
def mainfunc():
df = pd.read_csv("file.csv", sep='\t', encoding='utf-8')
print(df.head())
lambdafunc = lambda x: test_upper(x)
list_results = pd.Series()
def log_result(result):
list_results.append(result)
pool = mp.Pool(processes=4)
pool.apply_async(lambdafunc, (df['cols'], ), callback=log_result)
pool.close()
pool.join()
print(list_results)
mainfunc()
结果是空白 Series/list 因为我都试过了。我在这里做错了什么? 谢谢!
终于想通了
def test_upper(d):
output = d.apply(lambda x: x:upper())
return output
def mainfunc():
df = pd.read_csv("file.csv", sep='\t', encoding='utf-8')
print(df.head())
pool = mp.Pool(processes=4)
result = pool.apply_async(test_upper, (df['cols'], ))
pool.close()
pool.join()
print(result.get())
mainfunc()