apply_async 回调函数未被调用
apply_async callback function not being called
我是 python 的新手,我有计算数据特征的功能,然后 return 应该处理并写入文件的列表。,..我正在使用池进行计算,然后使用回调函数写入文件,但是回调函数没有被调用,我在里面放了一些打印语句,但它肯定没有被调用。
我的代码如下所示:
def write_arrow_format(results):
print("writer called")
results[1].to_csv("../data/model_data/feature-"+results[2],sep='\t',encoding='utf-8')
with open('../data/model_data/arow-'+results[2],'w') as f:
for dic in results[0]:
feature_list=[]
print(dic)
beginLine=True
for key,value in dic.items():
if(beginLine):
feature_list.append(str(value))
beginLine=False
else:
feature_list.append(str(key)+":"+str(value))
feature_line=" ".join(feature_list)
f.write(feature_line+"\n")
def generate_features(users,impressions,interactions,items,filename):
#some processing
return [result1,result2,filename]
if __name__=="__main__":
pool=mp.Pool(mp.cpu_count()-1)
for i in range(interval):
if i==interval:
pool.apply_async(generate_features,(users[begin:],impressions,interactions,items,str(i)),callback=write_arrow_format)
else:
pool.apply_async(generate_features,(users[begin:begin+interval],impressions,interactions,items,str(i)),callback=write_arrow_format)
begin=begin+interval
pool.close()
pool.join()
post generate_features
返回的列表中包含的内容并不明显。但是,如果 result1
、result2
或 filename
中的任何一个不可序列化,则出于某种原因,多处理库将不会调用回调函数,并且无法以静默方式调用。我 认为 这是因为多处理库试图在子进程和父进程之间来回传递对象之前对对象进行 pickle。如果您返回的任何内容不是 "pickleable"(即不可序列化),则不会调用回调。
我自己也遇到过这个错误,结果证明是一个记录器对象的实例给我带来了麻烦。这是重现我的问题的一些示例代码:
import multiprocessing as mp
import logging
def bad_test_func(ii):
print('Calling bad function with arg %i'%ii)
name = "file_%i.log"%ii
logging.basicConfig(filename=name,level=logging.DEBUG)
if ii < 4:
log = logging.getLogger()
else:
log = "Test log %i"%ii
return log
def good_test_func(ii):
print('Calling good function with arg %i'%ii)
instance = ('hello', 'world', ii)
return instance
def pool_test(func):
def callback(item):
print('This is the callback')
print('I have been given the following item: ')
print(item)
num_processes = 3
pool = mp.Pool(processes = num_processes)
results = []
for i in range(5):
res = pool.apply_async(func, (i,), callback=callback)
results.append(res)
pool.close()
pool.join()
def main():
print('#'*30)
print('Calling pool test with bad function')
print('#'*30)
pool_test(bad_test_func)
print('#'*30)
print('Calling pool test with good function')
print('#'*30)
pool_test(good_test_func)
if __name__ == '__main__':
main()
希望这对您有所帮助并为您指明正确的方向。
我是 python 的新手,我有计算数据特征的功能,然后 return 应该处理并写入文件的列表。,..我正在使用池进行计算,然后使用回调函数写入文件,但是回调函数没有被调用,我在里面放了一些打印语句,但它肯定没有被调用。 我的代码如下所示:
def write_arrow_format(results):
print("writer called")
results[1].to_csv("../data/model_data/feature-"+results[2],sep='\t',encoding='utf-8')
with open('../data/model_data/arow-'+results[2],'w') as f:
for dic in results[0]:
feature_list=[]
print(dic)
beginLine=True
for key,value in dic.items():
if(beginLine):
feature_list.append(str(value))
beginLine=False
else:
feature_list.append(str(key)+":"+str(value))
feature_line=" ".join(feature_list)
f.write(feature_line+"\n")
def generate_features(users,impressions,interactions,items,filename):
#some processing
return [result1,result2,filename]
if __name__=="__main__":
pool=mp.Pool(mp.cpu_count()-1)
for i in range(interval):
if i==interval:
pool.apply_async(generate_features,(users[begin:],impressions,interactions,items,str(i)),callback=write_arrow_format)
else:
pool.apply_async(generate_features,(users[begin:begin+interval],impressions,interactions,items,str(i)),callback=write_arrow_format)
begin=begin+interval
pool.close()
pool.join()
post generate_features
返回的列表中包含的内容并不明显。但是,如果 result1
、result2
或 filename
中的任何一个不可序列化,则出于某种原因,多处理库将不会调用回调函数,并且无法以静默方式调用。我 认为 这是因为多处理库试图在子进程和父进程之间来回传递对象之前对对象进行 pickle。如果您返回的任何内容不是 "pickleable"(即不可序列化),则不会调用回调。
我自己也遇到过这个错误,结果证明是一个记录器对象的实例给我带来了麻烦。这是重现我的问题的一些示例代码:
import multiprocessing as mp
import logging
def bad_test_func(ii):
print('Calling bad function with arg %i'%ii)
name = "file_%i.log"%ii
logging.basicConfig(filename=name,level=logging.DEBUG)
if ii < 4:
log = logging.getLogger()
else:
log = "Test log %i"%ii
return log
def good_test_func(ii):
print('Calling good function with arg %i'%ii)
instance = ('hello', 'world', ii)
return instance
def pool_test(func):
def callback(item):
print('This is the callback')
print('I have been given the following item: ')
print(item)
num_processes = 3
pool = mp.Pool(processes = num_processes)
results = []
for i in range(5):
res = pool.apply_async(func, (i,), callback=callback)
results.append(res)
pool.close()
pool.join()
def main():
print('#'*30)
print('Calling pool test with bad function')
print('#'*30)
pool_test(bad_test_func)
print('#'*30)
print('Calling pool test with good function')
print('#'*30)
pool_test(good_test_func)
if __name__ == '__main__':
main()
希望这对您有所帮助并为您指明正确的方向。