一个将字符串元组作为 return 整数的记忆函数?
A memoized function that takes a tuple of strings to return an integer?
假设我有这样的元组数组:
a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]
我正在尝试将这些数组转换为数值向量,每个维度代表一个特征。
所以我们的预期输出是这样的:
amod = [1, 0, 1] # or [1, 1, 1]
bmod = [1, 1, 2] # or [1, 2, 2]
所以创建的矢量取决于它之前看到的内容(即矩形仍然编码为 1
但新值 'large' 被编码为下一步 2
).
我想我可以结合使用 yield
和记忆功能来帮助我解决这个问题。这是我到目前为止尝试过的:
def memoize(f):
memo = {}
def helper(x):
if x not in memo:
memo[x] = f(x)
return memo[x]
return helper
@memoize
def verbal_to_value(tup):
u = 1
if tup[0] == 'shape':
yield u
u += 1
if tup[0] == 'fill':
yield u
u += 1
if tup[0] == 'size':
yield u
u += 1
但是我一直收到这个错误:
TypeError: 'NoneType' object is not callable
有没有一种方法可以创建这个函数,它可以记住它所看到的内容?如果它可以动态添加键,那么奖励积分,这样我就不必硬编码 'shape' 或 'fill'.
之类的东西
不是最好的方法,但可能会帮助您找出更好的解决方案
class Shape:
counter = {}
def to_tuple(self, tuples):
self.tuples = tuples
self._add()
l = []
for i,v in self.tuples:
l.append(self.counter[i][v])
return l
def _add(self):
for i,v in self.tuples:
if i in self.counter.keys():
if v not in self.counter[i]:
self.counter[i][v] = max(self.counter[i].values()) +1
else:
self.counter[i] = {v: 0}
a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]
s = Shape()
s.to_tuple(a)
s.to_tuple(b)
首先:这是我最喜欢的 memoize 实现
装饰器,主要是因为速度...
def memoize(f):
class memodict(dict):
__slots__ = ()
def __missing__(self, key):
self[key] = ret = f(key)
return ret
return memodict().__getitem__
除了一些极端情况外,它与您的效果相同:
def memoize(f):
memo = {}
def helper(x):
if x not in memo:
memo[x] = f(x)
#else:
# pass
return memo[x]
return helper
但速度稍快,因为 if x not in memo:
发生在
本机代码而不是 python。要理解它,你只需要
要知道一般情况下:解释adict[item]
python 调用 adict.__getitem__(key)
,如果 adict 不包含键,
__getitem__()
调用 adict.__missing__(key)
所以我们可以利用
python 魔术方法协议为我们的利益...
#This the first idea I had how I would implement your
#verbal_to_value() using memoization:
from collections import defaultdict
work=defaultdict(set)
@memoize
def verbal_to_value(kv):
k, v = kv
aset = work[k] #work creates a new set, if not already created.
aset.add(v) #add value if not already added
return len(aset)
包括 memoize 装饰器,那是 15 行代码...
#test suite:
def vectorize(alist):
return [verbal_to_value(kv) for kv in alist]
a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]
print (vectorize(a)) #shows [1,1,1]
print (vectorize(b)) #shows [1,2,2]
defaultdict 是一个功能强大的对象,具有几乎相同的逻辑
as memoize:各方面的标准词典,除了当
查找失败,它运行回调函数来创建丢失的
价值。在我们的例子中 set()
不幸的是,这个问题需要访问元组
被用作键,或字典状态本身。随着
结果我们不能只为 .default_factory
编写一个简单的函数
但是我们可以根据memoize/defaultdict模式写一个新的对象:
#This how I would implement your verbal_to_value without
#memoization, though the worker class is so similar to @memoize,
#that it's easy to see why memoize is a good pattern to work from:
class sloter(dict):
__slots__ = ()
def __missing__(self,key):
self[key] = ret = len(self) + 1
#this + 1 bothers me, why can't these vectors be 0 based? ;)
return ret
from collections import defaultdict
work2 = defaultdict(sloter)
def verbal_to_value2(kv):
k, v = kv
return work2[k][v]
#~10 lines of code?
#test suite2:
def vectorize2(alist):
return [verbal_to_value2(kv) for kv in alist]
print (vectorize2(a)) #shows [1,1,1]
print (vectorize2(b)) #shows [1,2,2]
你以前可能见过类似 sloter
的东西,因为它是
有时恰好用于这种情况。转换会员
名称到数字并返回。正因为如此,我们的优势在于
能够扭转这样的事情:
def unvectorize2(a_vector, pattern=('shape','fill','size')):
reverser = [{v:k2 for k2,v in work2[k].items()} for k in pattern]
for index, vect in enumerate(a_vector):
yield pattern[index], reverser[index][vect]
print (list(unvectorize2(vectorize2(a))))
print (list(unvectorize2(vectorize2(b))))
但我在你原来的 post 中看到了那些产量,他们已经得到了我
思考......如果有一个 memoize / defaultdict 之类的对象怎么办
可以使用生成器而不是函数,并且只知道
推进发电机而不是调用它。然后我意识到...
是的,生成器带有一个名为 __next__()
的可调用对象,它
意味着我们不需要一个新的 defaultdict 实现,只需要一个
仔细提取正确的成员函数...
def count(start=0): #same as: from itertools import count
while True:
yield start
start += 1
#so we could get the exact same behavior as above, (except faster)
#by saying:
sloter3=lambda :defaultdict(count(1).__next__)
#and then
work3 = defaultdict(sloter3)
#or just:
work3 = defaultdict(lambda :defaultdict(count(1).__next__))
#which yes, is a bit of a mindwarp if you've never needed to do that
#before.
#the outer defaultdict interprets the first item. Every time a new
#first item is received, the lambda is called, which creates a new
#count() generator (starting from 1), and passes it's .__next__ method
#to a new inner defaultdict.
def verbal_to_value3(kv):
k, v = kv
return work3[k][v]
#you *could* call that 8 lines of code, but we managed to use
#defaultdict twice, and didn't need to define it, so I wouldn't call
#it 'less complex' or anything.
#test suite3:
def vectorize3(alist):
return [verbal_to_value3(kv) for kv in alist]
print (vectorize3(a)) #shows [1,1,1]
print (vectorize3(b)) #shows [1,2,2]
#so yes, that can also work.
#and since the internal state in `work3` is stored in the exact same
#format, it be accessed the same way as `work2` to reconstruct input
#from output.
def unvectorize3(a_vector, pattern=('shape','fill','size')):
reverser = [{v:k2 for k2,v in work3[k].items()} for k in pattern]
for index, vect in enumerate(a_vector):
yield pattern[index], reverser[index][vect]
print (list(unvectorize3(vectorize3(a))))
print (list(unvectorize3(vectorize3(b))))
最终评论:
这些实现中的每一个都在全局存储状态
多变的。我觉得这是反审美的,但取决于你是什么
计划稍后处理该矢量,这可能是一个功能。正如我
证明了。
编辑:
另一天对此进行冥想,以及我可能需要它的各种情况,
我认为我会像这样封装此功能:
from collections import defaultdict
from itertools import count
class slotter4:
def __init__(self):
#keep track what order we expect to see keys
self.pattern = defaultdict(count(1).__next__)
#keep track of what values we've seen and what number we've assigned to mean them.
self.work = defaultdict(lambda :defaultdict(count(1).__next__))
def slot(self, kv, i=False):
"""used to be named verbal_to_value"""
k, v = kv
if i and i != self.pattern[k]:# keep track of order we saw initial keys
raise ValueError("Input fields out of order")
#in theory we could ignore this error, and just know
#that we're going to default to the field order we saw
#first. Or we could just not keep track, which might be
#required, if our code runs to slow, but then we cannot
#make pattern optional in .unvectorize()
return self.work[k][v]
def vectorize(self, alist):
return [self.slot(kv, i) for i, kv in enumerate(alist,1)]
#if we're not keeping track of field pattern, we could do this instead
#return [self.work[k][v] for k, v in alist]
def unvectorize(self, a_vector, pattern=None):
if pattern is None:
pattern = [k for k,v in sorted(self.pattern.items(), key=lambda a:a[1])]
reverser = [{v:k2 for k2,v in work3[k].items()} for k in pattern]
return [(pattern[index], reverser[index][vect])
for index, vect in enumerate(a_vector)]
#test suite4:
s = slotter4()
if __name__=='__main__':
Av = s.vectorize(a)
Bv = s.vectorize(b)
print (Av) #shows [1,1,1]
print (Bv) #shows [1,2,2]
print (s.unvectorize(Av))#shows a
print (s.unvectorize(Bv))#shows b
else:
#run the test silently, and only complain if something has broken
assert s.unvectorize(s.vectorize(a))==a
assert s.unvectorize(s.vectorize(b))==b
祝你好运!
假设我有这样的元组数组:
a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]
我正在尝试将这些数组转换为数值向量,每个维度代表一个特征。
所以我们的预期输出是这样的:
amod = [1, 0, 1] # or [1, 1, 1]
bmod = [1, 1, 2] # or [1, 2, 2]
所以创建的矢量取决于它之前看到的内容(即矩形仍然编码为 1
但新值 'large' 被编码为下一步 2
).
我想我可以结合使用 yield
和记忆功能来帮助我解决这个问题。这是我到目前为止尝试过的:
def memoize(f):
memo = {}
def helper(x):
if x not in memo:
memo[x] = f(x)
return memo[x]
return helper
@memoize
def verbal_to_value(tup):
u = 1
if tup[0] == 'shape':
yield u
u += 1
if tup[0] == 'fill':
yield u
u += 1
if tup[0] == 'size':
yield u
u += 1
但是我一直收到这个错误:
TypeError: 'NoneType' object is not callable
有没有一种方法可以创建这个函数,它可以记住它所看到的内容?如果它可以动态添加键,那么奖励积分,这样我就不必硬编码 'shape' 或 'fill'.
之类的东西不是最好的方法,但可能会帮助您找出更好的解决方案
class Shape:
counter = {}
def to_tuple(self, tuples):
self.tuples = tuples
self._add()
l = []
for i,v in self.tuples:
l.append(self.counter[i][v])
return l
def _add(self):
for i,v in self.tuples:
if i in self.counter.keys():
if v not in self.counter[i]:
self.counter[i][v] = max(self.counter[i].values()) +1
else:
self.counter[i] = {v: 0}
a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]
s = Shape()
s.to_tuple(a)
s.to_tuple(b)
首先:这是我最喜欢的 memoize 实现 装饰器,主要是因为速度...
def memoize(f):
class memodict(dict):
__slots__ = ()
def __missing__(self, key):
self[key] = ret = f(key)
return ret
return memodict().__getitem__
除了一些极端情况外,它与您的效果相同:
def memoize(f):
memo = {}
def helper(x):
if x not in memo:
memo[x] = f(x)
#else:
# pass
return memo[x]
return helper
但速度稍快,因为 if x not in memo:
发生在
本机代码而不是 python。要理解它,你只需要
要知道一般情况下:解释adict[item]
python 调用 adict.__getitem__(key)
,如果 adict 不包含键,
__getitem__()
调用 adict.__missing__(key)
所以我们可以利用
python 魔术方法协议为我们的利益...
#This the first idea I had how I would implement your
#verbal_to_value() using memoization:
from collections import defaultdict
work=defaultdict(set)
@memoize
def verbal_to_value(kv):
k, v = kv
aset = work[k] #work creates a new set, if not already created.
aset.add(v) #add value if not already added
return len(aset)
包括 memoize 装饰器,那是 15 行代码...
#test suite:
def vectorize(alist):
return [verbal_to_value(kv) for kv in alist]
a = [('shape', 'rectangle'), ('fill', 'no'), ('size', 'huge')]
b = [('shape', 'rectangle'), ('fill', 'yes'), ('size', 'large')]
print (vectorize(a)) #shows [1,1,1]
print (vectorize(b)) #shows [1,2,2]
defaultdict 是一个功能强大的对象,具有几乎相同的逻辑
as memoize:各方面的标准词典,除了当
查找失败,它运行回调函数来创建丢失的
价值。在我们的例子中 set()
不幸的是,这个问题需要访问元组 被用作键,或字典状态本身。随着 结果我们不能只为 .default_factory
编写一个简单的函数但是我们可以根据memoize/defaultdict模式写一个新的对象:
#This how I would implement your verbal_to_value without
#memoization, though the worker class is so similar to @memoize,
#that it's easy to see why memoize is a good pattern to work from:
class sloter(dict):
__slots__ = ()
def __missing__(self,key):
self[key] = ret = len(self) + 1
#this + 1 bothers me, why can't these vectors be 0 based? ;)
return ret
from collections import defaultdict
work2 = defaultdict(sloter)
def verbal_to_value2(kv):
k, v = kv
return work2[k][v]
#~10 lines of code?
#test suite2:
def vectorize2(alist):
return [verbal_to_value2(kv) for kv in alist]
print (vectorize2(a)) #shows [1,1,1]
print (vectorize2(b)) #shows [1,2,2]
你以前可能见过类似 sloter
的东西,因为它是
有时恰好用于这种情况。转换会员
名称到数字并返回。正因为如此,我们的优势在于
能够扭转这样的事情:
def unvectorize2(a_vector, pattern=('shape','fill','size')):
reverser = [{v:k2 for k2,v in work2[k].items()} for k in pattern]
for index, vect in enumerate(a_vector):
yield pattern[index], reverser[index][vect]
print (list(unvectorize2(vectorize2(a))))
print (list(unvectorize2(vectorize2(b))))
但我在你原来的 post 中看到了那些产量,他们已经得到了我
思考......如果有一个 memoize / defaultdict 之类的对象怎么办
可以使用生成器而不是函数,并且只知道
推进发电机而不是调用它。然后我意识到...
是的,生成器带有一个名为 __next__()
的可调用对象,它
意味着我们不需要一个新的 defaultdict 实现,只需要一个
仔细提取正确的成员函数...
def count(start=0): #same as: from itertools import count
while True:
yield start
start += 1
#so we could get the exact same behavior as above, (except faster)
#by saying:
sloter3=lambda :defaultdict(count(1).__next__)
#and then
work3 = defaultdict(sloter3)
#or just:
work3 = defaultdict(lambda :defaultdict(count(1).__next__))
#which yes, is a bit of a mindwarp if you've never needed to do that
#before.
#the outer defaultdict interprets the first item. Every time a new
#first item is received, the lambda is called, which creates a new
#count() generator (starting from 1), and passes it's .__next__ method
#to a new inner defaultdict.
def verbal_to_value3(kv):
k, v = kv
return work3[k][v]
#you *could* call that 8 lines of code, but we managed to use
#defaultdict twice, and didn't need to define it, so I wouldn't call
#it 'less complex' or anything.
#test suite3:
def vectorize3(alist):
return [verbal_to_value3(kv) for kv in alist]
print (vectorize3(a)) #shows [1,1,1]
print (vectorize3(b)) #shows [1,2,2]
#so yes, that can also work.
#and since the internal state in `work3` is stored in the exact same
#format, it be accessed the same way as `work2` to reconstruct input
#from output.
def unvectorize3(a_vector, pattern=('shape','fill','size')):
reverser = [{v:k2 for k2,v in work3[k].items()} for k in pattern]
for index, vect in enumerate(a_vector):
yield pattern[index], reverser[index][vect]
print (list(unvectorize3(vectorize3(a))))
print (list(unvectorize3(vectorize3(b))))
最终评论:
这些实现中的每一个都在全局存储状态 多变的。我觉得这是反审美的,但取决于你是什么 计划稍后处理该矢量,这可能是一个功能。正如我 证明了。
编辑: 另一天对此进行冥想,以及我可能需要它的各种情况, 我认为我会像这样封装此功能:
from collections import defaultdict
from itertools import count
class slotter4:
def __init__(self):
#keep track what order we expect to see keys
self.pattern = defaultdict(count(1).__next__)
#keep track of what values we've seen and what number we've assigned to mean them.
self.work = defaultdict(lambda :defaultdict(count(1).__next__))
def slot(self, kv, i=False):
"""used to be named verbal_to_value"""
k, v = kv
if i and i != self.pattern[k]:# keep track of order we saw initial keys
raise ValueError("Input fields out of order")
#in theory we could ignore this error, and just know
#that we're going to default to the field order we saw
#first. Or we could just not keep track, which might be
#required, if our code runs to slow, but then we cannot
#make pattern optional in .unvectorize()
return self.work[k][v]
def vectorize(self, alist):
return [self.slot(kv, i) for i, kv in enumerate(alist,1)]
#if we're not keeping track of field pattern, we could do this instead
#return [self.work[k][v] for k, v in alist]
def unvectorize(self, a_vector, pattern=None):
if pattern is None:
pattern = [k for k,v in sorted(self.pattern.items(), key=lambda a:a[1])]
reverser = [{v:k2 for k2,v in work3[k].items()} for k in pattern]
return [(pattern[index], reverser[index][vect])
for index, vect in enumerate(a_vector)]
#test suite4:
s = slotter4()
if __name__=='__main__':
Av = s.vectorize(a)
Bv = s.vectorize(b)
print (Av) #shows [1,1,1]
print (Bv) #shows [1,2,2]
print (s.unvectorize(Av))#shows a
print (s.unvectorize(Bv))#shows b
else:
#run the test silently, and only complain if something has broken
assert s.unvectorize(s.vectorize(a))==a
assert s.unvectorize(s.vectorize(b))==b
祝你好运!