为什么base64编码后的字符串比原文件大
Why the size of base64-encoded string is larger than the original file
我的原始 PDF 文件大小约为 24MB,但是当我将其编码为 based64 字符串时,字符串大小约为 31MB。我想知道为什么会这样。
对于图像文件来说很容易理解,因为它可能会丢失一些压缩,但对于 PDF 或其他格式的文件也会发生这种情况?
just wondering why
因为 Base64 每字节的有意义位数少于二进制数据格式(通常是 6 个而不是 8 个)。这是特别的,因此它可以在二进制数据无法承受的各种文本转换中存活。
Wikipedia's page 有一个很好的图表显示:
作为文本 table (遗憾的是,SO 使用的 GitHub 风格的降价不支持具有不同列数的表格):
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| Text content | M | a | n |
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| ASCII | 77 (0x4d) | 97 (0x61) | 110 (0x6e) |
| Bit pattern | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
| Index | 19 | 22 | 5 | 46 |
| Base64−encoded | T | W | F | u |
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+
注意 Base64 如何仅使用每个字节的低六位,因此“Man”最终变成四个字节长。
It is easy to understand for image file since since it may lose some compression
需要说明的是,Base64 编码是无损的。当你解码它时,你会逐字节得到你开始的内容。
我的原始 PDF 文件大小约为 24MB,但是当我将其编码为 based64 字符串时,字符串大小约为 31MB。我想知道为什么会这样。
对于图像文件来说很容易理解,因为它可能会丢失一些压缩,但对于 PDF 或其他格式的文件也会发生这种情况?
just wondering why
因为 Base64 每字节的有意义位数少于二进制数据格式(通常是 6 个而不是 8 个)。这是特别的,因此它可以在二进制数据无法承受的各种文本转换中存活。
Wikipedia's page 有一个很好的图表显示:
作为文本 table (遗憾的是,SO 使用的 GitHub 风格的降价不支持具有不同列数的表格):
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| Text content | M | a | n |
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−−+
| ASCII | 77 (0x4d) | 97 (0x61) | 110 (0x6e) |
| Bit pattern | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 0 |
| Index | 19 | 22 | 5 | 46 |
| Base64−encoded | T | W | F | u |
+−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+−−−−−−−−−−−−−−−−−−−−−−−+
注意 Base64 如何仅使用每个字节的低六位,因此“Man”最终变成四个字节长。
It is easy to understand for image file since since it may lose some compression
需要说明的是,Base64 编码是无损的。当你解码它时,你会逐字节得到你开始的内容。