python 中的连锁记忆器
Chain memoizer in python
我已经有了一个很好用的记忆器。它使用泡菜转储序列化输入并创建 MD5 哈希作为密钥。函数结果非常大,存储为 pickle 文件,文件名为 MD5 哈希。当我依次调用两个记忆函数时,memoizer
将加载第一个函数的输出并将其传递给第二个函数。第二个函数将序列化它,创建 MD5 然后加载输出。这是一个非常简单的代码:
@memoize
def f(x):
...
return y
@memoize
def g(x):
...
return y
y1 = f(x1)
y2 = g(y1)
y1
在计算 f
时从磁盘加载,然后在计算 g
时序列化。是否可以以某种方式绕过此步骤并将 y1
的密钥(即 MD5 哈希)传递给 g
?如果 g
已经有这个密钥,它会从磁盘加载 y2
。如果没有,它 "requests" 完整的 y1
用于评估 g
。
编辑:
import cPickle as pickle
import inspect
import hashlib
class memoize(object):
def __init__(self, func):
self.func = func
def __call__(self, *args, **kwargs):
arg = inspect.getargspec(self.func).args
file_name = self._get_key(*args, **kwargs)
try:
f = open(file_name, "r")
out = pickle.load(f)
f.close()
except:
out = self.func(*args, **kwargs)
f = open(file_name, "wb")
pickle.dump(out, f, 2)
f.close()
return out
def _arg_hash(self, *args, **kwargs):
_str = pickle.dumps(args, 2) + pickle.dumps(kwargs, 2)
return hashlib.md5(_str).hexdigest()
def _src_hash(self):
_src = inspect.getsource(self.func)
return hashlib.md5(_src).hexdigest()
def _get_key(self, *args, **kwargs):
arg = self._arg_hash(*args, **kwargs)
src = self._src_hash()
return src + '_' + arg + '.pkl'
我认为您可以自动执行此操作,但我通常认为最好明确说明 "lazy" 评估。因此,我将介绍一种方法,为您的记忆函数添加一个额外的参数:lazy
。但是我将稍微简化帮助程序,而不是文件、pickle 和 md5:
# I use a dictionary as storage instead of files
storage = {}
# No md5, just hash
def calculate_md5(obj):
print('calculating md5 of', obj)
return hash(obj)
# create dictionary entry instead of pickling the data to a file
def create_file(md5, data):
print('creating file for md5', md5)
storage[md5] = data
# Load dictionary entry instead of unpickling a file
def load_file(md5):
print('loading file with md5 of', md5)
return storage[md5]
我使用自定义 class 作为中间对象:
class MemoizedObject(object):
def __init__(self, md5):
self.md5 = result_md5
def get_real_data(self):
print('load...')
return load_file(self.md5)
def __repr__(self):
return '{self.__class__.__name__}(md5={self.md5})'.format(self=self)
最后,我展示了更改后的 Memoize
,假设您的函数只接受一个参数:
class Memoize(object):
def __init__(self, func):
self.func = func
# The md5 to md5 storage is needed to find the result file
# or result md5 for lazy evaluation.
self.md5_to_md5_storage = {}
def __call__(self, x, lazy=False):
# If the argument is a memoized object no need to
# calculcate the hash, we can just look it up.
if isinstance(x, MemoizedObject):
key = x.md5
else:
key = calculate_md5(x)
if lazy and key in self.md5_to_md5_storage:
# Check if the key is present in the md5 to md5 storage, otherwise
# we can't be lazy
return MemoizedObject(self.md5_to_md5_storage[key])
elif not lazy and key in self.md5_to_md5_storage:
# Not lazy but we know the result
result = load_file(self.md5_to_md5_storage[key])
else:
# Unknown argument
result = self.func(x)
result_md5 = calculate_md5(result)
create_file(result_md5, result)
self.md5_to_md5_storage[key] = result_md5
return result
现在,如果您调用您的函数并在正确的位置指定惰性,您可以避免加载(unpickling)您的文件:
@Memoize
def f(x):
return x+1
@Memoize
def g(x):
return x+2
正常(第一)运行:
>>> x1 = 10
>>> y1 = f(x1)
calculating md5 of 10
calculating md5 of 11
creating file for md5 11
>>> y2 = g(y1)
calculating md5 of 11
calculating md5 of 13
creating file for md5 13
没有lazy
:
>>> x1 = 10
>>> y1 = f(x1)
calculating md5 of 10
loading file with md5 of 11
>>> y2 = g(y1)
calculating md5 of 11
loading file with md5 of 13
与lazy=True
>>> x1 = 10
>>> y1 = f(x1, lazy=True)
calculating md5 of 10
>>> y2 = g(y1)
loading file with md5 of 13
最后一个选项只计算第一个参数的 "md5" 并加载 end-result 的文件。那应该正是你想要的。
我已经有了一个很好用的记忆器。它使用泡菜转储序列化输入并创建 MD5 哈希作为密钥。函数结果非常大,存储为 pickle 文件,文件名为 MD5 哈希。当我依次调用两个记忆函数时,memoizer
将加载第一个函数的输出并将其传递给第二个函数。第二个函数将序列化它,创建 MD5 然后加载输出。这是一个非常简单的代码:
@memoize
def f(x):
...
return y
@memoize
def g(x):
...
return y
y1 = f(x1)
y2 = g(y1)
y1
在计算 f
时从磁盘加载,然后在计算 g
时序列化。是否可以以某种方式绕过此步骤并将 y1
的密钥(即 MD5 哈希)传递给 g
?如果 g
已经有这个密钥,它会从磁盘加载 y2
。如果没有,它 "requests" 完整的 y1
用于评估 g
。
编辑:
import cPickle as pickle
import inspect
import hashlib
class memoize(object):
def __init__(self, func):
self.func = func
def __call__(self, *args, **kwargs):
arg = inspect.getargspec(self.func).args
file_name = self._get_key(*args, **kwargs)
try:
f = open(file_name, "r")
out = pickle.load(f)
f.close()
except:
out = self.func(*args, **kwargs)
f = open(file_name, "wb")
pickle.dump(out, f, 2)
f.close()
return out
def _arg_hash(self, *args, **kwargs):
_str = pickle.dumps(args, 2) + pickle.dumps(kwargs, 2)
return hashlib.md5(_str).hexdigest()
def _src_hash(self):
_src = inspect.getsource(self.func)
return hashlib.md5(_src).hexdigest()
def _get_key(self, *args, **kwargs):
arg = self._arg_hash(*args, **kwargs)
src = self._src_hash()
return src + '_' + arg + '.pkl'
我认为您可以自动执行此操作,但我通常认为最好明确说明 "lazy" 评估。因此,我将介绍一种方法,为您的记忆函数添加一个额外的参数:lazy
。但是我将稍微简化帮助程序,而不是文件、pickle 和 md5:
# I use a dictionary as storage instead of files
storage = {}
# No md5, just hash
def calculate_md5(obj):
print('calculating md5 of', obj)
return hash(obj)
# create dictionary entry instead of pickling the data to a file
def create_file(md5, data):
print('creating file for md5', md5)
storage[md5] = data
# Load dictionary entry instead of unpickling a file
def load_file(md5):
print('loading file with md5 of', md5)
return storage[md5]
我使用自定义 class 作为中间对象:
class MemoizedObject(object):
def __init__(self, md5):
self.md5 = result_md5
def get_real_data(self):
print('load...')
return load_file(self.md5)
def __repr__(self):
return '{self.__class__.__name__}(md5={self.md5})'.format(self=self)
最后,我展示了更改后的 Memoize
,假设您的函数只接受一个参数:
class Memoize(object):
def __init__(self, func):
self.func = func
# The md5 to md5 storage is needed to find the result file
# or result md5 for lazy evaluation.
self.md5_to_md5_storage = {}
def __call__(self, x, lazy=False):
# If the argument is a memoized object no need to
# calculcate the hash, we can just look it up.
if isinstance(x, MemoizedObject):
key = x.md5
else:
key = calculate_md5(x)
if lazy and key in self.md5_to_md5_storage:
# Check if the key is present in the md5 to md5 storage, otherwise
# we can't be lazy
return MemoizedObject(self.md5_to_md5_storage[key])
elif not lazy and key in self.md5_to_md5_storage:
# Not lazy but we know the result
result = load_file(self.md5_to_md5_storage[key])
else:
# Unknown argument
result = self.func(x)
result_md5 = calculate_md5(result)
create_file(result_md5, result)
self.md5_to_md5_storage[key] = result_md5
return result
现在,如果您调用您的函数并在正确的位置指定惰性,您可以避免加载(unpickling)您的文件:
@Memoize
def f(x):
return x+1
@Memoize
def g(x):
return x+2
正常(第一)运行:
>>> x1 = 10
>>> y1 = f(x1)
calculating md5 of 10
calculating md5 of 11
creating file for md5 11
>>> y2 = g(y1)
calculating md5 of 11
calculating md5 of 13
creating file for md5 13
没有lazy
:
>>> x1 = 10
>>> y1 = f(x1)
calculating md5 of 10
loading file with md5 of 11
>>> y2 = g(y1)
calculating md5 of 11
loading file with md5 of 13
与lazy=True
>>> x1 = 10
>>> y1 = f(x1, lazy=True)
calculating md5 of 10
>>> y2 = g(y1)
loading file with md5 of 13
最后一个选项只计算第一个参数的 "md5" 并加载 end-result 的文件。那应该正是你想要的。