如何避免在 python API 服务器中重复处理?

How to avoid duplicate processing in python API server?

假设函数 detect_primes 的调用成本很高,我想避免使用重复参数重复调用它。我该怎么办?

使用缓存没有帮助,因为该函数可以在不同的请求中同时调用。当两个请求都看到缓存中没有值时,都将继续执行昂贵的函数。

def detect_primes(nums: List[int]) -> Dict[int, bool]:
    """ detect whether a list of numbers are prime """
@app.route('/detect', methods=['GET'])
def search():
    args = request.args
    nums = list(map(int, args.get('nums', '').split(',')))
    return detect_primes(nums)

例如,如果一个用户请求 13、14、15,另一个用户请求 15、16。 答案是 {"13": true, "14": false, "15": false}{"15": false, "16": false}

我想避免用 [13, 14, 15][15, 16] 调用 detect_primes。理想情况下,两个请求都应等待 [13, 14, 15, 16] 调用(或两个调用 [13, 14, 15][16]),并 return 各自的结果。

web框架的选择对我来说不重要,你可以假设是flask或者fastapi。

编辑:不确定问题是如何与 重复或在 中得到回答 如上所述,不能使用缓存(无论是内存中的 python 缓存还是外部缓存或数据库)。我很高兴被一个答案证明是错误的。

但是,我建议至少对在调用 detect_primes 之前使用的值字典使用缓存,以获取每个输入数字的已计算值。访问 dict 元素很快,到目前为止 dict 并不大。 尝试异步访问计算值的字典,也许使用 Redis。

类似的东西

shared_dict = {}
async def search():
    args = request.args
    nums = list(map(int, args.get('nums', '').split(',')))
    computed_values = []
    to_compute_values = []
    async for num in nums: 
        if await is_in_dict(num):
            computed_values.update({num:True})
        else:
            to_compute_values.append(num)
    #join to dicts
    return detect_primes(to_compute_values) | computed_values

根据FastAPI's documentation

when you declare a path operation function with normal def instead of async def, it is run in an external threadpool that is then awaited, instead of being called directly (as it would block the server).

因此,当您使用 def 而不是 async def 时,服务器会同时处理请求。

在你的情况下 - 因为你将它描述为 "理想情况下,两个请求都应该等待......" - 你可以用 [= 声明 search 路线26=].

from fastapi import FastAPI, Query
from typing import List, Dict

app = FastAPI()
d = {}

def is_prime(n) -> bool:
    # check whether 'n' is prime or not

def detect_primes(nums: List[int]) -> Dict[int, bool]:
    res = {}
    for n in nums:
        if n in d:
            res[n] = d.get(n)
            print(f'{n} found in dict')
        else:
            is_n_Prime = is_prime(n)
            res[n] = is_n_Prime
            d[n] = is_n_Prime
    return res

@app.get("/detect")
async def search(nums: List[int] = Query(...)):
    return detect_primes(nums)

但是,如果您需要在 async def 路由中使用 await(这会导致并发处理请求),例如,您可以使用 Semaphore object to control the access to the dictionary, as described . However, if you plan on having multiple workers active at the same time (where each worker has its own memory - and hence, they don't share the same memory), you should rather consider using a database storage, or Key-Value stores (Caches), such as Redis (have a look at the answers here and ). Also, you may want to try using aioredlock, which allows "creating distributed locks between workers (processes)", as described here.