限制调用函数的频率,同时返回一个缓存值

Limit how frequently a function is called, returning a cached value in the mean time

我有一个每秒处理很多(~100K)个请求的服务。在每个请求之前,它会检查(例如)是否开始下雨,如果开始下雨,行为会发生变化:

if(IsRaining())
    return "You probably shouldn't go anywhere today.";
//... otherwise proceed

IsRaining 版本 1(最慢)

public bool IsRaining() => ExternalService.IsRaining();

在尝试加快我的服务时,我发现检查 Service.IsRaining 是性能瓶颈。


我决定不在乎状态 只是刚从 变为 "raining",我可以将结果缓存一小段时间。 (有一点例外——如果雨停了,我想马上知道)。

我使用以下方法解决了这个问题:

IsRaining 版本 2(更快)

bool isRainingCache;
public bool IsRaining()
{
    DateTime now = DateTime.UTCNow;
    // If the last time we checked, it was raining, make sure it still is. OR
    // If it hasn't been raining, only check again if we haven't checked in the past second.
    if (isRainingCache || (now - lastChecked) > TimeSpan.FromSeconds(1))
    {
        isRainingCache = ExternalService.IsRaining();
        lastChecked = now;
    }
    return isRainingCache;
}

这使事情变得更快并且工作了很长时间。然后,我的服务变得更快,它开始每秒被调用 数百次 数千次,基准测试告诉我调用 DateTime.Now 的次数占所有 的 50% CPU时间。

我知道你在想什么:

Is calling DateTime.Now really your bottleneck?

我很确定是的。我每秒调用它数十万次。我真正的服务只是哈希映射查找的包装器 - 调用应该非常快。


我的下一个想法是,与其检查每次调用的持续时间,不如让某些计时器在一段时间后异步 过期 缓存的结果:

IsRaining 版本 3(最快?)

bool? isRainingCache = null;
public bool IsRaining()
{
    // Only check for rain if the cache is empty, or it was raining last time we checked.
    if (isRainingCache == null || isRainingCache == true)
    {
        isRainingCache = ExternalService.IsRaining();
        // If it's not raining, force us to check again after 1 second
        if(!isRainingCache)
            Task.Run(() => Task.Delay(1000).ContinueWith(() => { isRainingCache = null; }));
    }
    return false;
}

以上(未经测试)会加快速度,但我觉得这给我带来了几个新问题:

  • 像这样 "fire and forget" 一个 Task 感觉很侮辱(尤其是每秒一次)。
  • 如果我的服务被处置或完成,我将留下排队的任务。我觉得我需要坚持任务或取消令牌。
  • 我通常对 TPL 没有经验,但我觉得在这里使用 Timers 或 Threads 是不合适的,根据我的经验,这会导致无数其他关闭和清理问题。

如果有人对更好的方法有任何建议,我将不胜感激。

我有几个这样的案例,我认为将解决方案抽象到它自己的包装器中会很好 [=7​​5=],例如:

// Calls the getter at most once per 1000 ms, returns a cached value otherwise.
public Throttled<bool> IsRaining = new Throttled<bool>(() => Service.IsRaining, 1000);

DateTime.Now 调用是应用程序的瓶颈这一事实表明体系结构可能存在问题。这里可能出现的错误是我们正在更新方法的缓存 inside,该方法应该只获取最新值 return。如果我们分开更新缓存和方法来获取最新值,我们会得到如下内容:

const int UpdateCacheInterval = 300;

// we use keyword volatile as we access this variable from different threads
private volatile bool isRainingCache;

private Task UpdateCacheTask { get; set; }
// Use it to cancel background task when it's requred
private CancellationTokenSource CancellationTokenSource = new CancellationTokenSource();

private void InitializeCache()
{
   UpdateCacheTask = Task.Run(async () => 
   {
      while(!CancellationTokenSource.Token.IsCancellationRequested)
      {
         await Task.Delay(UpdateCacheInterval);
         isRainingCache = ExternalService.IsRaining();
      }
   }, CancellationTokenSource.Token);
}

public bool IsRaining()
{
    // set the UpdateCacheInterval to a short interval where it's not possible
    // that one second has expired from the time of the last check
    return isRainingCache;
}

// To stop the task execution
public async Task Stop()
{
    CancellationTokenSource.Cancel();        
    await UpdateCacheTask; 
}

I'm generally inexperienced with TPL, but I feel like it's not appropriate to use Timers or Threads here, which in my experience can lead to a myriad of other shutdown and cleanup issues

在这里使用计时器和线程是完全没问题的,因为我们需要一些后台工作者来更新缓存。

如果您更改代码以使用 Environment.TickCount,您应该会注意到加速。这可能是您可以查看的最便宜的计时器。

@Fabjan 的回答可能更好,但是,如果您真的看到此方法每秒命中 100,000 次。

bool isRainingCache;
int lastChecked = Environment.TickCount - 1001;

public bool IsRaining()
{
    int now = Environment.TickCount;
    // If the last time we checked, it was raining, make sure it still is. OR
    // If it hasn't been raining, only check again if we haven't checked in the past second.
    if (isRainingCache || unchecked(now - lastChecked) > 1000)
    {
        isRainingCache = ExternalService.IsRaining();
        lastChecked = now;
    }
    return isRainingCache;
}

使用 Stopwatch 代替 DateTime.Now 的简单重写显着减少了开销(对于这个孤立的部分)。

(因为这里发布了另一个答案 Environment.TickCount 我添加它是为了完整性,它是所有开销中最低的,请注意这个值在 24-25 天前有周转率它变为负数,因此任何解决方案都需要考虑到这一点,请注意 by @Cory Nelson 会这样做,它使用 unchecked 来确保减法有效。)

void Main()
{
    BenchmarkSwitcher.FromAssembly(GetType().Assembly).RunAll();
}

public class Benchmarks
{
    private DateTime _Last = DateTime.Now;
    private DateTime _Next = DateTime.Now.AddSeconds(1);
    private Stopwatch _Stopwatch = Stopwatch.StartNew();
    private int _NextTick = Environment.TickCount + 1000;

    [Benchmark]
    public void ReadDateTime()
    {
        bool areWeThereYet = DateTime.Now >= _Last.AddSeconds(1);
    }

    [Benchmark]
    public void ReadDateTimeAhead()
    {
        bool areWeThereYet = DateTime.Now >= _Next;
    }

    [Benchmark]
    public void ReadStopwatch()
    {
        bool areWeThereYet = _Stopwatch.ElapsedMilliseconds >= 1000;
    }

    [Benchmark]
    public void ReadEnvironmentTick()
    {
        bool areWeThereYet = Environment.TickCount > _NextTick;
    }
}

输出:

              Method |       Mean |     Error |    StdDev |
-------------------- |-----------:|----------:|----------:|
        ReadDateTime | 220.958 ns | 4.3334 ns | 4.8166 ns |
   ReadDateTimeAhead | 214.025 ns | 0.8364 ns | 0.7414 ns |
       ReadStopwatch |  25.365 ns | 0.1805 ns | 0.1689 ns |
 ReadEnvironmentTick |   1.832 ns | 0.0163 ns | 0.0153 ns |

因此,对此进行简单的更改应该会减少此 isolated 部分代码的开销:

bool isRainingCache;
Stopwatch stopwatch = Stopwatch.StartNew();
public bool IsRaining()
{
    DateTime now = DateTime.Now;
    // If the last time we checked, it was raining, make sure it still is. OR
    // If it hasn't been raining, only check again if we haven't checked in the past second.
    if (isRainingCache || stopwatch.ElapsedMilliseconds > 1000)
    {
        isRainingCache = ExternalService.IsRaining();
        stopwatch.Restart();
    }
    return isRainingCache;
}

感谢您提供不同的方法。如果有人好奇,我确实最终将此功能抽象为可重用的 class,所以我可以去:

private static readonly Throttled<bool> ThrottledIsRaining =
    new Throttled<bool>(ExternalService.IsRaining, 1000);

public static bool IsRaining()
{
    bool cachedIsRaining = ThrottledIsRaining.Value;
    // This extra bit satisfies my special case - bypass the cache while it's raining
    if (!cachedIsRaining) return false;
    return ThrottledIsRaining.ForceGetUpdatedValue();
}

/// <summary>Similar to <see cref="Lazy{T}"/>. Wraps an expensive getter
/// for a value by caching the result and only invoking the supplied getter
/// to update the value if the specified cache expiry time has elapsed.</summary>
/// <typeparam name="T">The type of underlying value.</typeparam>
public class Throttled<T>
{
    #region Private Fields
    /// <summary>The time (in milliseconds) we must to cache the value after
    /// it has been retrieved.</summary>
    private readonly int _cacheTime;

    /// <summary>Prevent multiple threads from updating the value simultaneously.</summary>
    private readonly object _updateLock = new object();

    /// <summary>The function used to retrieve the underlying value.</summary>
    private readonly Func<T> _getValue;

    /// <summary>The cached result from the last time the underlying value was retrieved.</summary>
    private T _cachedValue;

    /// <summary>The last time the value was retrieved</summary>
    private volatile int _lastRetrieved;
    #endregion Private Fields

    /// <summary>Get the underlying value, updating the result if the cache has expired.</summary>
    public T Value
    {
        get
        {
            int now = Environment.TickCount;
            // If the cached value has expired, update it
            if (unchecked(now - _lastRetrieved) > _cacheTime)
            {
                lock (_updateLock)
                {
                    // Upon acquiring the lock, ensure another thread didn't update it first.
                    if (unchecked(now - _lastRetrieved) > _cacheTime)
                        return ForceGetUpdatedValue();
                }
            }
            return _cachedValue;
        }
    }

    /// <summary>Construct a new throttled value getter.</summary>
    /// <param name="getValue">The function used to retrieve the underlying value.</param>
    /// <param name="cacheTime">The time (in milliseconds) we must to cache the value after
    /// it has been retrieved</param>
    public Throttled(Func<T> getValue, int cacheTime)
    {
        _getValue = getValue;
        _cacheTime = cacheTime;
        _lastRetrieved = unchecked(Environment.TickCount - cacheTime);
    }

    /// <summary>Retrieve the current value, regardless of whether
    /// the current cached value has expired.</summary>
    public T ForceGetUpdatedValue()
    {
        _cachedValue = _getValue();
        _lastRetrieved = Environment.TickCount;
        return _cachedValue;
    }

    /// <summary>Allows instances of this class to be accessed like a normal
    /// <typeparamref name="T"/> identifier.</summary>
    public static explicit operator T(Throttled<T> t) => t.Value;
}

我决定使用@CoryNelson 的 来最小化到期检查时间。虽然使用异步过期机制应该更快,但我发现维护额外的一次性资源的复杂性和担心额外的线程和清理问题是不值得的。

我还考虑了@Servy 的 ,当多个线程访问相同的节流值时可能会出现这种情况。添加锁避免在到期 window.

内多次不必要地更新值

如果您认为我遗漏了什么,请告诉我。谢谢大家。