如何通知用户正在使用缓存?
How to inform user that cache is being used?
我正在使用 python 库 diskcache
及其装饰器 @cache.memoize
来缓存对我的 couchdb 数据库的调用。工作正常。但是,我想向用户打印数据是从数据库返回还是从缓存返回。
我什至不知道如何解决这个问题。
到目前为止我的代码:
import couchdb
from diskcache import Cache
cache = Cache("couch_cache")
@cache.memoize()
def fetch_doc(url: str, database: str, doc_id: str) -> dict:
server = couchdb.Server(url=url)
db = server[database]
return dict(db[doc_id])
这是一种方法,但我并不真正推荐它,因为 (1) 它添加了一个额外的操作,即您自己手动检查缓存,并且 (2) 它可能会重复库内部已经在做的事情。我没有对任何性能影响进行适当的检查,因为我没有生产 data/env 和各种 doc_id
s,但正如 所说,它 可以 由于额外的查找操作,速度变慢了。
但事情就是这样。
diskcache.Cache object "supports a familiar Python mapping interface" (like dict
s). You can then manually check for yourself if a given key is already present in the cache, using the same key automatically generated based on the arguments to the memoize
-d函数:
An additional __cache_key__
attribute can be used to generate the cache key used for the given arguments.
>>> key = fibonacci.__cache_key__(100)
>>> print(cache[key])
>>> 354224848179261915075
因此,您可以将 fetch_doc
函数包装到 另一个 函数中,该函数检查缓存键是否基于 url
、database
,并且 doc_id
参数存在,将结果打印给用户,所有这些都在调用实际的 fetch_doc
函数之前:
import couchdb
from diskcache import Cache
cache = Cache("couch_cache")
@cache.memoize()
def fetch_doc(url: str, database: str, doc_id: str) -> dict:
server = couchdb.Server(url=url)
db = server[database]
return dict(db[doc_id])
def fetch_doc_with_logging(url: str, database: str, doc_id: str):
# Generate the key
key = fetch_doc.__cache_key__(url, database, doc_id)
# Print out whether getting from cache or not
if key in cache:
print(f'Getting {doc_id} from cache!')
else:
print(f'Getting {doc_id} from DB!')
# Call the actual memoize-d function
return fetch_doc(url, database, doc_id)
测试时:
url = 'https://your.couchdb.instance'
database = 'test'
doc_id = 'c97bbe3127fb6b89779c86da7b000885'
cache.stats(enable=True, reset=True)
for _ in range(5):
fetch_doc_with_logging(url, database, doc_id)
print(f'(hits, misses) = {cache.stats()}')
# Only for testing, so 1st call will always miss and will get from DB
cache.clear()
它输出:
$ python test.py
Getting c97bbe3127fb6b89779c86da7b000885 from DB!
Getting c97bbe3127fb6b89779c86da7b000885 from cache!
Getting c97bbe3127fb6b89779c86da7b000885 from cache!
Getting c97bbe3127fb6b89779c86da7b000885 from cache!
Getting c97bbe3127fb6b89779c86da7b000885 from cache!
(hits, misses) = (4, 1)
您可以将该包装函数变成装饰器:
def log_if_cache_or_not(memoized_func):
def _wrap(*args):
key = memoized_func.__cache_key__(*args)
if key in cache:
print(f'Getting {doc_id} from cache!')
else:
print(f'Getting {doc_id} from DB!')
return memoized_func(*args)
return _wrap
@log_if_cache_or_not
@cache.memoize()
def fetch_doc(url: str, database: str, doc_id: str) -> dict:
server = couchdb.Server(url=url)
db = server[database]
return dict(db[doc_id])
for _ in range(5):
fetch_doc(url, database, doc_id)
或,组合成1个新装饰器:
def memoize_with_logging(func):
memoized_func = cache.memoize()(func)
def _wrap(*args):
key = memoized_func.__cache_key__(*args)
if key in cache:
print(f'Getting {doc_id} from cache!')
else:
print(f'Getting {doc_id} from DB!')
return memoized_func(*args)
return _wrap
@memoize_with_logging
def fetch_doc(url: str, database: str, doc_id: str) -> dict:
server = couchdb.Server(url=url)
db = server[database]
return dict(db[doc_id])
for _ in range(5):
fetch_doc(url, database, doc_id)
一些快速测试:
In [9]: %timeit for _ in range(100000): fetch_doc(url, database, doc_id)
13.7 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [10]: %timeit for _ in range(100000): fetch_doc_with_logging(url, database, doc_id)
21.2 s ± 637 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(如果 doc_id
在调用中随机变化可能会更好)
同样,正如我在开始时提到的,缓存和 memoize
-ing 函数调用应该可以加速该函数。这个答案增加了缓存查找和 printing/logging 的额外操作,无论您是从数据库还是从缓存中获取,它 可能 影响该函数调用的性能。适当测试。
我正在使用 python 库 diskcache
及其装饰器 @cache.memoize
来缓存对我的 couchdb 数据库的调用。工作正常。但是,我想向用户打印数据是从数据库返回还是从缓存返回。
我什至不知道如何解决这个问题。
到目前为止我的代码:
import couchdb
from diskcache import Cache
cache = Cache("couch_cache")
@cache.memoize()
def fetch_doc(url: str, database: str, doc_id: str) -> dict:
server = couchdb.Server(url=url)
db = server[database]
return dict(db[doc_id])
这是一种方法,但我并不真正推荐它,因为 (1) 它添加了一个额外的操作,即您自己手动检查缓存,并且 (2) 它可能会重复库内部已经在做的事情。我没有对任何性能影响进行适当的检查,因为我没有生产 data/env 和各种 doc_id
s,但正如
但事情就是这样。
diskcache.Cache object "supports a familiar Python mapping interface" (like dict
s). You can then manually check for yourself if a given key is already present in the cache, using the same key automatically generated based on the arguments to the memoize
-d函数:
An additional
__cache_key__
attribute can be used to generate the cache key used for the given arguments.>>> key = fibonacci.__cache_key__(100) >>> print(cache[key]) >>> 354224848179261915075
因此,您可以将 fetch_doc
函数包装到 另一个 函数中,该函数检查缓存键是否基于 url
、database
,并且 doc_id
参数存在,将结果打印给用户,所有这些都在调用实际的 fetch_doc
函数之前:
import couchdb
from diskcache import Cache
cache = Cache("couch_cache")
@cache.memoize()
def fetch_doc(url: str, database: str, doc_id: str) -> dict:
server = couchdb.Server(url=url)
db = server[database]
return dict(db[doc_id])
def fetch_doc_with_logging(url: str, database: str, doc_id: str):
# Generate the key
key = fetch_doc.__cache_key__(url, database, doc_id)
# Print out whether getting from cache or not
if key in cache:
print(f'Getting {doc_id} from cache!')
else:
print(f'Getting {doc_id} from DB!')
# Call the actual memoize-d function
return fetch_doc(url, database, doc_id)
测试时:
url = 'https://your.couchdb.instance'
database = 'test'
doc_id = 'c97bbe3127fb6b89779c86da7b000885'
cache.stats(enable=True, reset=True)
for _ in range(5):
fetch_doc_with_logging(url, database, doc_id)
print(f'(hits, misses) = {cache.stats()}')
# Only for testing, so 1st call will always miss and will get from DB
cache.clear()
它输出:
$ python test.py
Getting c97bbe3127fb6b89779c86da7b000885 from DB!
Getting c97bbe3127fb6b89779c86da7b000885 from cache!
Getting c97bbe3127fb6b89779c86da7b000885 from cache!
Getting c97bbe3127fb6b89779c86da7b000885 from cache!
Getting c97bbe3127fb6b89779c86da7b000885 from cache!
(hits, misses) = (4, 1)
您可以将该包装函数变成装饰器:
def log_if_cache_or_not(memoized_func):
def _wrap(*args):
key = memoized_func.__cache_key__(*args)
if key in cache:
print(f'Getting {doc_id} from cache!')
else:
print(f'Getting {doc_id} from DB!')
return memoized_func(*args)
return _wrap
@log_if_cache_or_not
@cache.memoize()
def fetch_doc(url: str, database: str, doc_id: str) -> dict:
server = couchdb.Server(url=url)
db = server[database]
return dict(db[doc_id])
for _ in range(5):
fetch_doc(url, database, doc_id)
或
def memoize_with_logging(func):
memoized_func = cache.memoize()(func)
def _wrap(*args):
key = memoized_func.__cache_key__(*args)
if key in cache:
print(f'Getting {doc_id} from cache!')
else:
print(f'Getting {doc_id} from DB!')
return memoized_func(*args)
return _wrap
@memoize_with_logging
def fetch_doc(url: str, database: str, doc_id: str) -> dict:
server = couchdb.Server(url=url)
db = server[database]
return dict(db[doc_id])
for _ in range(5):
fetch_doc(url, database, doc_id)
一些快速测试:
In [9]: %timeit for _ in range(100000): fetch_doc(url, database, doc_id)
13.7 s ± 112 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [10]: %timeit for _ in range(100000): fetch_doc_with_logging(url, database, doc_id)
21.2 s ± 637 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(如果 doc_id
在调用中随机变化可能会更好)
同样,正如我在开始时提到的,缓存和 memoize
-ing 函数调用应该可以加速该函数。这个答案增加了缓存查找和 printing/logging 的额外操作,无论您是从数据库还是从缓存中获取,它 可能 影响该函数调用的性能。适当测试。