有没有办法在不知道解码值编码的情况下确定 base64 编码值的长度？

Question

我有一个 base 64 编码值，我想在不知道解码值编码的情况下使用此值来了解解码值的长度。

例如"foo"的base 64编码值为Zm9v。当我这样解码时：

var bytes = Convert.FromBase64String("Zm9v");

我得到一个包含三个字节的数组。在这种情况下，我可以很容易地将长度确定为 3，但假设我们有 "ü" 作为值，即 "w7w="以 64 为基数：

// length = 2
var bytes = Convert.FromBase64String("w7w=");

字节数组的长度是2，所以第一个解决方案失败了，我想到的另一种选择是使用UTF8编码从字节中获取字符串，然后获取长度：

var lenght = Encoding.UTF8.GetString(bytes).Lenght;

我认为这会起作用，因为 UTF8 是常用的，但同样我对这个解决方案也不满意。我该怎么办？如果一开始就不知道值的编码，就不可能找到一个通用的解决方案吗？

Answer 1

不是获取字节的长度，而是将其转换为字符串，然后获取字符串的长度：

var bytes = Convert.FromBase64String("w7w=");
int length = Encoding.UTF8.GetString(bytes).Length;

DEMO

Answer 2

如果没有编码，则无法从字节数组中知道字符串的长度。 1000 字节的 BLOB 可能是 500 字符的 Unicode 或 1000 字符的 ASCII 字符串。没有编码你永远不会知道。

Answer 3

这里有两个问题，一个容易，一个（在一般情况下）不可能。

最简单的方法是获取 base64 字符串编码的字节数。您可以通过查看 base64 字符串中的字符数以及末尾有多少 = 个字符来执行此操作而无需实际执行解码。

通常不可能的一个是获取由任意字节序列编码的字符数。我说一般不可能，因为 字符数取决于编码，正确地猜测编码总是不可能的 。这个问题有时被称为 the Notepad file encoding problem，Raymond Chen 在那里解释得比我好得多，尽管我会摘录：

For example, consider this file:
D0 AE
Depending on which encoding you assume, you get very different results.

If you assume 8-bit ANSI (with code page 1252), then the file consists of the two characters U+00D0 U+00AE, or "Ð®". Sure this looks strange, but maybe it's part of the word VATNIÐ® which might be the name of an Icelandic hotel.

If you assume UTF-8, then the file consists of the single Cyrillic character U+042E, or "Ю".

If you assume Unicode big-endian, then the file consists of the Korean Hangul syllable U+D0AE, or "킮".

If you assume Unicode little-endian, then the file consists of the Korean Hangul syllable U+AED0, or "껐".

有没有办法在不知道解码值编码的情况下确定 base64 编码值的长度？

Is there a way to determine length of a base64 encoded value without knowing the encoding of decoded value?

c#

base64

encoding

utf-8