什么时候在 C# 中使用 'unsafe string modifications' 是安全的？

Question

private const int RESULT_LENGTH = 10;

public static unsafe string Encode1(byte[] data)
{
    var result = new string('0', RESULT_LENGTH); // memory allocation

    fixed (char* c = result)
    {
        for (int i = 0; i < RESULT_LENGTH; i++)
        {
            c[i] = DetermineChar(data, i);
        }
    }

    return result;
}


public static string Encode2(byte[] data)
{
    var chars = new char[RESULT_LENGTH]; // memory allocation

    for (int i = 0; i < RESULT_LENGTH; i++)
    {
        chars[i] = DetermineChar(data, i);
    }

    return new string(chars); // again a memory allocation
}

private static char DetermineChar(byte[] data, int index)
{
    // dummy algorithm.
    return 'a';
}

这两种方法都根据某种特定算法将字节数组编码为字符串。第一个创建一个字符串并使用指针写入单个字符。第二个创建一个字符数组，并最终使用该数组实例化一个字符串。

我知道字符串是不可变的，并且多个字符串声明可以指向同一个分配的内存。此外，根据 this article，除非绝对必要，否则不应使用不安全的字符串修改。

我的问题： 什么时候可以安全地使用 Encode1 示例代码中使用的 'unsafe string modifications'？

PS。我知道 Span and Memory, and the string.Create 方法是较新的概念。我只是对这个具体案例感到好奇。

编辑

感谢您的所有回复。也许我问题中的 'safe' 这个词比它带来的好处更令人困惑。我并不是说它与 unsafe 关键字相反，而是在白话意义上。

Answer 1

最终，这是唯一一次 "safe"（在白话意义上，而不是在 unsafe 意义上）是当您拥有该字符串并且它尚未暴露给任何外部代码时谁会期望它是不可变的。唯一一次常见看到这种情况是当你正在构建一个新的 string 而你不能只是使用Encoding 上的 GetString 方法 - 例如，因为源数据是不连续的并且可能跨越多个 Encoder 步骤。

所以基本上，Encode1 中显示的场景分配一个已知长度的新 string，然后立即覆盖字符数据是唯一合理的用法。一旦字符串在野外：保持不变。

但是，如果您即使是远程 也可以避免它：我愿意。这在 Encode1 的上下文中绝对有意义，但是...

一种情况要特别小心：实习字符串（常量、文字等）；你不拥有这些。

什么时候在 C# 中使用 'unsafe string modifications' 是安全的？

When is it safe to use 'unsafe string modifications' in C#?

c#

string

pointers

unsafe