为什么base64编码后的字符串比原文件大

Why the size of base64-encoded string is larger than the original file

我的原始 PDF 文件大小约为 24MB,但是当我将其编码为 based64 字符串时,字符串大小约为 31MB。我想知道为什么会这样。

对于图像文件来说很容易理解,因为它可能会丢失一些压缩,但对于 PDF 或其他格式的文件也会发生这种情况?

just wondering why

因为 Base64 每字节的有意义位数少于二进制数据格式(通常是 6 个而不是 8 个)。这是特别的,因此它可以在二进制数据无法承受的各种文本转换中存活。

Wikipedia's page 有一个很好的图表显示:

作为文本 table (遗憾的是,SO 使用的 GitHub 风格的降价不支持具有不同列数的表格):

+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
|   Text content  |               M               |               a               |               n               |
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
|     ASCII       |           77 (0x4d)           |           97 (0x61)           |          110 (0x6e)           |
|  Bit pattern    | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
|     Index       |           19          |           22          |           5           |           46          |
| Base64−encoded  |           T           |           W           |           F           |           u           |
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+

注意 Base64 如何仅使用每个字节的低六位,因此“Man”最终变成四个字节长。

It is easy to understand for image file since since it may lose some compression

需要说明的是,Base64 编码是无损的。当你解码它时,你会逐字节得到你开始的内容。