在 HashSet<T> 中是否包含线程安全

Is Contains thread safe in HashSet<T>

查看 .NET 源代码中 HashSet<T> class 中 Contains 的代码,我找不到 Contains 不是线程安全的任何原因?

我正在提前加载带有值的 HashSet<T>,然后在多线程 .AsParallel() 循环中检查 Contains

这有什么不安全的原因吗? 当我实际上不需要存储值时,我不愿意使用 ConcurrentDictionary

Normally (normally) 仅用于读取的集合是 "unofficially" 线程安全的(在.NET,我知道它会在阅读过程中自行修改)。有一些注意事项:

  • 项目本身不能是线程安全的(但是对于 HashSet<T> 这个问题应该被最小化,因为你不能从中提取项目。仍然是 GetHashCode()Equals() 必须是线程安全的。例如,如果它们访问按需加载的惰性对象,则它们可能不是线程安全的,或者它们可能 cache/memoize 一些数据以加速后续操作)
  • 您必须确保在最后一次写入之后有一个 Thread.MemoryBarrier()(在与写入相同的线程中完成)或等效的,否则在另一个线程上读取可能会读取不完整的数据
  • 您必须确保在每个线程(与您进行写入的线程不同)中,在进行第一次读取之前都有 Thread.MemoryBarrier()。请注意,如果 HashSet<T> 在 creating/starting 其他线程之前是 "prepared"(最后是 Thread.MemoryBarrier()),则 Thread.MemoryBarrier() 不是必需的,因为线程不能读取过时的内存(因为它们不存在)。各种操作导致隐式 Thread.MemoryBarrier()。例如,如果在 HashSet<T> 被填充之前创建的线程,输入 Wait() 并且在 HashSet<T> 被填充(加上它的 Thread.MemoryBarrier())之后是 un-Waited,退出 Wait() 会导致隐式 Thread.MemoryBarrier()

一个使用 memoization/lazy loading/whatever 的 class 的简单示例,你想调用它,这样会破坏线程安全。

public class MyClass
{
    private long value2;

    public int Value1 { get; set; }

    // Value2 is lazily loaded in a very primitive
    // way (note that Lazy<T> *can* be used thread-safely!)
    public long Value2
    {
        get
        {
            if (value2 == 0)
            {
                // value2 is a long. If the .NET is running at 32 bits,
                // the assignment of a long (64 bits) isn't atomic :)
                value2 = LoadFromServer();

                // If thread1 checks and see value2 == 0 and loads it,
                // and then begin writing value2 = (value), but after
                // writing the first 32 bits of value2 we have that
                // thread2 reads value2, then thread2 will read an
                // "incomplete" data. If this "incomplete" data is == 0
                // then a second LoadFromServer() will be done. If the
                // operation was repeatable then there won't be any 
                // problem (other than time wasted). But if the 
                // operation isn't repeatable, or if the incomplete 
                // data that is read is != 0, then there will be a
                // problem (for example an exception if the operation 
                // wasn't repeatable, or different data if the operation
                // wasn't deterministic, or incomplete data if the read
                // was != 0)
            }

            return value2;
        }
    }

    private long LoadFromServer()
    {
        // This is a slow operation that justifies a lazy property
        return 1; 
    }

    public override int GetHashCode()
    {
        // The GetHashCode doesn't use Value2, because it
        // wants to be fast
        return Value1;
    }

    public override bool Equals(object obj)
    {
        MyClass obj2 = obj as MyClass;

        if (obj2 == null)
        {
            return false;
        }

        // The equality operator uses Value2, because it
        // wants to be correct.
        // Note that probably the HashSet<T> doesn't need to
        // use the Equals method on Add, if there are no
        // other objects with the same GetHashCode
        // (and surely, if the HashSet is empty and you Add a
        // single object, that object won't be compared with
        // anything, because there isn't anything to compare
        // it with! :-) )

        // Clearly the Equals is used by the Contains method
        // of the HashSet
        return Value1 == obj2.Value1 && Value2 == obj2.Value2;
    }
}

鉴于您要提前为集合加载值,您可以使用 System.Collections.Immutable 库中的 ImmutableHashSet<T>immutable collections advertise themselves as thread safe,所以我们不用担心HashSet<T>的"unofficial"线程安全。

var builder = ImmutableHashSet.CreateBuilder<string>(); // The builder is not thread safe

builder.Add("value1");
builder.Add("value2");

ImmutableHashSet<string> set = builder.ToImmutable();

...

if (set.Contains("value1")) // Thread safe operation
{
 ...
}

来自微软:Thread-Safe Collections

The .NET Framework 4 introduces the System.Collections.Concurrent namespace, which includes several collection classes that are both thread-safe and scalable. Multiple threads can safely and efficiently add or remove items from these collections, without requiring additional synchronization in user code. When you write new code, use the concurrent collection classes whenever multiple threads will write to the collection concurrently. If you are only reading from a shared collection, then you can use the classes in the System.Collections.Generic namespace. We recommend that you do not use 1.0 collection classes unless you are required to target the .NET Framework 1.1 or earlier runtime.

由于Contains不修改集合,只是读操作,HashSetSystem.Collections.Generic中,并发调用Contains绝对没问题。