ConcurrentDictionary性能

Question

我正在努力解决这个问题，非常感谢任何帮助。我正在处理一个现有项目。我添加了计算值组合的逻辑，并确保我们没有超过某些限制。例如，给定此数据 table 列：
Name|Age|description
该代码确保我们没有超过 K 种姓名、年龄的组合。我的数据包含像这样的百万对。在某些时候，程序只是崩溃或卡住，尽管我没有看到任何内存问题或 CPU 问题。我使用 ConcurrentDictionary 的元组 (Name, Age) 作为键来实现此限制，并且我使用的是 C# .NET 6 ..
我可以看到尝试向 DS 添加元素所花费的时间在某些时候变得非常长。

编辑：添加一些代码片段，虽然有很多内部实现，但我相信这些是理解问题的主要代码部分：

这里是负责限制键的组件：

    protected override Result Process(Row row)
    {
        var valueToLimit = GetValueToLimit(row);
        var result = _values.TryAdd(valueToLimit);
        }
// some logic related to the case of crossing the limit
        return Result.Success;
    }

    protected abstract T GetValueToLimit(Row row);
}

我的案例实现了 GetValueToLimit 函数：

protected override string[] GetValueToLimit(Row row)
{ // takes the relevant values from an input record, according to the requested columns. 
    return _columnIndices.Select(x => row.GetValue(x)).ToArray();
}

最后，这里是并发 HashSet 实现的一些部分：

    public class BoundedConcurrentHashSet<K> : ConcurrentHashSet<K>
{
 ..
    public override Result TryAdd(K element)
    {
        if (Dictionary.Count() < _maxCapacity)
        {
            return base.TryAdd(element);
        }
        else
        {
            return Contains(element) ? Result.AlreadyInHash : Result.ExceedsCapacity;
        }
    }

其中 concurrentHashSet 是用 C# concurrentDictionary 实现的：

public class ConcurrentHashSet<K>
{
    public ConcurrentHashSet(IEqualityComparer<K> equalityComparer)
    {
        Dictionary = new ConcurrentDictionary<K, object>(equalityComparer);
    }

    protected ConcurrentDictionary<K, object> Dictionary { get; }

    public int Count => Dictionary.Count;

    public IEnumerable<K> Elements => Dictionary.Keys;

    public virtual Result TryAdd(K element)
    {
        return Dictionary.TryAdd(element, null) ? dResult.Added : Result.AlreadyInHash;
    }

    public bool Contains(K element)
    {
        return Dictionary.ContainsKey(element);
    }

请分享任何有帮助的想法。

谢谢

Answer 1

这是你的问题：

public override ConcurrentHashSetAddResult TryAdd(K element)
{
    if (Dictionary.Count() < _maxCapacity)
    {
        return base.TryAdd(element);
    }
    //...

...其中 Dictionary 是基础 ConcurrentDictionary<K, object> 对象。

Count() is a LINQ method that either enumerates the enumerable sequence from start to end, or returns the Count property in case the sequence implements the ICollection<TSource> interface. The ConcurrentDictionary<K, V> implements this interface, so the Count property is used indeed. Here is what the documentation of this property 说：

This property has snapshot semantics and represents the number of items in the ConcurrentDictionary<TKey,TValue> at the moment when the property was accessed.

“快照语义”是重要的部分。这意味着为了获取 Count，字典必须暂时完全锁定。当一个线程读取 Count 时，所有其他线程都必须等待。完全没有并发。

ApproximateCount 属性曾在 GitHub 的某个时候被提议，但没有获得足够的关注，现在已关闭。属性将允许您以大大减少的开销实现 BoundConcurrentHashSet 功能，但行为也不太准确：有可能超过 _maxCapacity 配置。

我的建议是放弃 ConcurrentDictionary<K, object>，使用 HashSet<T> 作为底层存储，用 lock.

保护

Answer 2

我发现使用普通 collections 并在迭代和添加的地方使用锁比使用并发 collections 快得多。添加到 collection.

中的项目越多，情况就会变得如此。

ConcurrentDictionary性能

ConcurrentDictionary performance

.net

c#

dictionary

tuples

concurrentdictionary