多处理:如何在使用 pool.map 时为每个实例编写单独的日志文件?
Multiprocessing: How to write separate log files for each instance while using pool.map?
我想创建一个 class,每个实例都在其中写入自己的日志文件。当我使用函数而不是 class 时(或者当我不使用多处理时),这工作正常:
import multiprocessing, logging
def setup_logger(name_logfile, path_logfile):
logger = logging.getLogger(name_logfile)
formatter = logging.Formatter('%(asctime)s: %(message)s', datefmt='%Y/%m/%d %H:%M:%S')
fileHandler = logging.FileHandler(path_logfile, mode='w')
fileHandler.setFormatter(formatter)
streamHandler = logging.StreamHandler()
streamHandler.setFormatter(formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(fileHandler)
logger.addHandler(streamHandler)
return logger
def MyFunc(A):
print A
logger = setup_logger('Logfile%s' %A, '/dev/shm/Logfile%s.log' %A)
logger.info('text to be written to logfile')
pool = multiprocessing.Pool(2)
pool.map(MyFunc,[1,2])
pool.close()
pool.join()
但是当我使用 class 时,出现酸洗错误:
import multiprocessing, logging
class MyClass(object):
def __init__(self,A):
print A
self.logger = self.setup_logger('Logfile%s' %A, '/dev/shm/Logfile%s.log' %A)
self.logger.info('text to be written to logfile')
def setup_logger(self,name_logfile, path_logfile):
logger = logging.getLogger(name_logfile)
formatter = logging.Formatter('%(asctime)s: %(message)s', datefmt='%Y/%m/%d %H:%M:%S')
fileHandler = logging.FileHandler(path_logfile, mode='w')
fileHandler.setFormatter(formatter)
streamHandler = logging.StreamHandler()
streamHandler.setFormatter(formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(fileHandler)
logger.addHandler(streamHandler)
return logger
pool = multiprocessing.Pool(2)
pool.map(MyClass,[1,2])
pool.close()
pool.join()
输出:
1
2
2015/02/12 14:05:09: text to be written to logfile
2015/02/12 14:05:09: text to be written to logfile
Process PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 99, in worker
Process PoolWorker-2:
put((job, i, result))
File "/usr/lib64/python2.7/multiprocessing/queues.py", line 392, in put
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
return send(obj)
PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 99, in worker
put((job, i, result))
File "/usr/lib64/python2.7/multiprocessing/queues.py", line 392, in put
return send(obj)
PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed
我不知道这个错误的原因是什么,因为每个日志文件都有自己的输出路径。我需要记录器作为对象的一个属性,那么我该如何解决这个 pickling 错误?
基本上,您想调用 multiprocessing.get_logger() 而不是 logging.getLogger()。
查看Python multiprocessing - logging.FileHandler object raises PicklingError
的第一个答案
你不能 pickle 记录器。
您可以做的是在对象被 pickled 和 unpickled 时删除并重置记录器:
import multiprocessing, logging
class MyClass(object):
def __init__(self,A):
print A
self.A = A # we need to keep the name!
self.logger = self.setup_logger('Logfile%s' %A, '/misc/hy5/scheffler/Skripte_Models/python/Tests/Logfile%s.log' %A)
self.logger.info('text to be written to logfile')
def setup_logger(self,name_logfile, path_logfile):
logger = logging.getLogger(name_logfile)
formatter = logging.Formatter('%(asctime)s: %(message)s', datefmt='%Y/%m/%d %H:%M:%S')
fileHandler = logging.FileHandler(path_logfile, mode='w')
fileHandler.setFormatter(formatter)
streamHandler = logging.StreamHandler()
streamHandler.setFormatter(formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(fileHandler)
logger.addHandler(streamHandler)
return logger
def __getstate__(self):
"""Called for pickling.
Removes the logger to allow pickling and returns a copy of `__dict__`.
"""
statedict = self.__dict__.copy()
if 'logger' in statedict:
# Pickling does not work with loggers objects, so we just keep the logger's name:
del statedict['logger']
return statedict
def __setstate__(self, statedict):
"""Called after loading a pickle dump.
Restores `__dict__` from `statedict` and adds a new logger.
"""
self.__dict__.update(statedict)
process_name = multiprocessing.current_process().name
self.logger = self.setup_logger('Logfile%s' % self.A,
'/dev/shm/Logfile%s_%s.log' % (self.A, process_name)
请注意,我们将进程名称添加到日志文件中以避免多个进程操作同一个文件!您可能还想确保日志记录处理程序和相应的文件在某个时候关闭。
编辑:
多处理模块中有一个multiprocessing aware logger。但是,我总是觉得这个太局限了。
我想创建一个 class,每个实例都在其中写入自己的日志文件。当我使用函数而不是 class 时(或者当我不使用多处理时),这工作正常:
import multiprocessing, logging
def setup_logger(name_logfile, path_logfile):
logger = logging.getLogger(name_logfile)
formatter = logging.Formatter('%(asctime)s: %(message)s', datefmt='%Y/%m/%d %H:%M:%S')
fileHandler = logging.FileHandler(path_logfile, mode='w')
fileHandler.setFormatter(formatter)
streamHandler = logging.StreamHandler()
streamHandler.setFormatter(formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(fileHandler)
logger.addHandler(streamHandler)
return logger
def MyFunc(A):
print A
logger = setup_logger('Logfile%s' %A, '/dev/shm/Logfile%s.log' %A)
logger.info('text to be written to logfile')
pool = multiprocessing.Pool(2)
pool.map(MyFunc,[1,2])
pool.close()
pool.join()
但是当我使用 class 时,出现酸洗错误:
import multiprocessing, logging
class MyClass(object):
def __init__(self,A):
print A
self.logger = self.setup_logger('Logfile%s' %A, '/dev/shm/Logfile%s.log' %A)
self.logger.info('text to be written to logfile')
def setup_logger(self,name_logfile, path_logfile):
logger = logging.getLogger(name_logfile)
formatter = logging.Formatter('%(asctime)s: %(message)s', datefmt='%Y/%m/%d %H:%M:%S')
fileHandler = logging.FileHandler(path_logfile, mode='w')
fileHandler.setFormatter(formatter)
streamHandler = logging.StreamHandler()
streamHandler.setFormatter(formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(fileHandler)
logger.addHandler(streamHandler)
return logger
pool = multiprocessing.Pool(2)
pool.map(MyClass,[1,2])
pool.close()
pool.join()
输出:
1
2
2015/02/12 14:05:09: text to be written to logfile
2015/02/12 14:05:09: text to be written to logfile
Process PoolWorker-1:
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 99, in worker
Process PoolWorker-2:
put((job, i, result))
File "/usr/lib64/python2.7/multiprocessing/queues.py", line 392, in put
Traceback (most recent call last):
File "/usr/lib64/python2.7/multiprocessing/process.py", line 258, in _bootstrap
return send(obj)
PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed
self.run()
File "/usr/lib64/python2.7/multiprocessing/process.py", line 114, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib64/python2.7/multiprocessing/pool.py", line 99, in worker
put((job, i, result))
File "/usr/lib64/python2.7/multiprocessing/queues.py", line 392, in put
return send(obj)
PicklingError: Can't pickle <type 'thread.lock'>: attribute lookup thread.lock failed
我不知道这个错误的原因是什么,因为每个日志文件都有自己的输出路径。我需要记录器作为对象的一个属性,那么我该如何解决这个 pickling 错误?
基本上,您想调用 multiprocessing.get_logger() 而不是 logging.getLogger()。
查看Python multiprocessing - logging.FileHandler object raises PicklingError
的第一个答案你不能 pickle 记录器。 您可以做的是在对象被 pickled 和 unpickled 时删除并重置记录器:
import multiprocessing, logging
class MyClass(object):
def __init__(self,A):
print A
self.A = A # we need to keep the name!
self.logger = self.setup_logger('Logfile%s' %A, '/misc/hy5/scheffler/Skripte_Models/python/Tests/Logfile%s.log' %A)
self.logger.info('text to be written to logfile')
def setup_logger(self,name_logfile, path_logfile):
logger = logging.getLogger(name_logfile)
formatter = logging.Formatter('%(asctime)s: %(message)s', datefmt='%Y/%m/%d %H:%M:%S')
fileHandler = logging.FileHandler(path_logfile, mode='w')
fileHandler.setFormatter(formatter)
streamHandler = logging.StreamHandler()
streamHandler.setFormatter(formatter)
logger.setLevel(logging.DEBUG)
logger.addHandler(fileHandler)
logger.addHandler(streamHandler)
return logger
def __getstate__(self):
"""Called for pickling.
Removes the logger to allow pickling and returns a copy of `__dict__`.
"""
statedict = self.__dict__.copy()
if 'logger' in statedict:
# Pickling does not work with loggers objects, so we just keep the logger's name:
del statedict['logger']
return statedict
def __setstate__(self, statedict):
"""Called after loading a pickle dump.
Restores `__dict__` from `statedict` and adds a new logger.
"""
self.__dict__.update(statedict)
process_name = multiprocessing.current_process().name
self.logger = self.setup_logger('Logfile%s' % self.A,
'/dev/shm/Logfile%s_%s.log' % (self.A, process_name)
请注意,我们将进程名称添加到日志文件中以避免多个进程操作同一个文件!您可能还想确保日志记录处理程序和相应的文件在某个时候关闭。
编辑:
多处理模块中有一个multiprocessing aware logger。但是,我总是觉得这个太局限了。