使用 Javascript 的 atob 解码 base64 不能正确解码 utf-8 字符串

Using Javascript's atob to decode base64 doesn't properly decode utf-8 strings

我正在使用 Javascript window.atob() 函数解码 base64 编码的字符串(特别是来自 GitHub API 的 base64 编码的内容)。问题是我正在取回 ASCII 编码的字符(例如 ⢠而不是 )。如何正确处理传入的 base64 编码流,以便将其解码为 utf-8?

Unicode 问题

虽然 JavaScript (ECMAScript) 已经成熟,但 Base64、ASCII 和 Unicode 编码的脆弱性引起了很多头痛(其中大部分在这个问题的历史中)。

考虑以下示例:

const ok = "a";
console.log(ok.codePointAt(0).toString(16)); //   61: occupies < 1 byte

const notOK = "✓"
console.log(notOK.codePointAt(0).toString(16)); // 2713: occupies > 1 byte

console.log(btoa(ok));    // YQ==
console.log(btoa(notOK)); // error

我们为什么会遇到这种情况?

Base64, by design, expects binary data as its input. In terms of JavaScript strings, this means strings in which each character occupies only one byte. So if you pass a string into btoa() containing characters that occupy more than one byte, you will get an error, because this is not considered binary data.

资料来源:MDN(2021 年)

最初的 MDN 文章还涵盖了 window.btoa.atob 的损坏性质,此后已在现代 ECMAScript 中得到修复。原始的、现已失效的 MDN 文章解释道:

The "Unicode Problem" Since DOMStrings are 16-bit-encoded strings, in most browsers calling window.btoa on a Unicode string will cause a Character Out Of Range exception if a character exceeds the range of a 8-bit byte (0x00~0xFF).


具有二进制互操作性的解决方案

(继续滚动寻找 ASCII base64 解决方案)

资料来源:MDN(2021 年)

MDN 推荐的解决方案是实际编码进出二进制字符串表示:

UTF8 ⇢ 二进制编码

// convert a Unicode string to a string in which
// each 16-bit unit occupies only one byte
function toBinary(string) {
  const codeUnits = new Uint16Array(string.length);
  for (let i = 0; i < codeUnits.length; i++) {
    codeUnits[i] = string.charCodeAt(i);
  }
  return btoa(String.fromCharCode(...new Uint8Array(codeUnits.buffer)));
}

// a string that contains characters occupying > 1 byte
let encoded = toBinary("✓ à la mode") // "EycgAOAAIABsAGEAIABtAG8AZABlAA=="

解码二进制 ⇢ UTF-8

function fromBinary(encoded) {
  const binary = atob(encoded);
  const bytes = new Uint8Array(binary.length);
  for (let i = 0; i < bytes.length; i++) {
    bytes[i] = binary.charCodeAt(i);
  }
  return String.fromCharCode(...new Uint16Array(bytes.buffer));
}

// our previous Base64-encoded string
let decoded = fromBinary(encoded) // "✓ à la mode"

这个有点失败的地方是,您会注意到编码字符串 EycgAOAAIABsAGEAIABtAG8AZABlAA== 不再匹配之前解决方案的字符串 4pyTIMOgIGxhIG1vZGU=。这是因为它是二进制编码的字符串,而不是 UTF-8 编码的字符串。如果这对您来说无关紧要(即,您没有从另一个系统转换以 UTF-8 表示的字符串),那么您就可以开始了。但是,如果您想保留 UTF-8 功能,最好使用下面描述的解决方案。


具有 ASCII base64 互操作性的解决方案

这个问题的整个历史表明多年来我们不得不使用多少种不同的方法来解决损坏的编码系统。尽管最初的 MDN 文章已不存在,但这个解决方案仍然可以说是一个更好的解决方案,并且在解决“Unicode 问题”方面做得很好,同时维护可以解码的纯文本 base64 字符串,比如 base64decode.org

有两种可能的方法来解决这个问题:

  • the first one is to escape the whole string (with UTF-8, see encodeURIComponent) and then encode it;
  • the second one is to convert the UTF-16 DOMString to an UTF-8 array of characters and then encode it.

之前的解决方案说明:MDN文章最初建议使用unescapeescape来解决Character Out Of Range异常问题,但后来被弃用了.这里的一些其他答案建议使用 decodeURIComponentencodeURIComponent 来解决这个问题,这已被证明是不可靠且不可预测的。此答案的最新更新使用现代 JavaScript 函数来提高速度和使代码现代化。

如果您想节省一些时间,您还可以考虑使用库:

编码 UTF8 ⇢ base64

    function b64EncodeUnicode(str) {
        // first we use encodeURIComponent to get percent-encoded UTF-8,
        // then we convert the percent encodings into raw bytes which
        // can be fed into btoa.
        return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
            function toSolidBytes(match, p1) {
                return String.fromCharCode('0x' + p1);
        }));
    }
    
    b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
    b64EncodeUnicode('\n'); // "Cg=="

解码 base64 ⇢ UTF8

    function b64DecodeUnicode(str) {
        // Going backwards: from bytestream, to percent-encoding, to original string.
        return decodeURIComponent(atob(str).split('').map(function(c) {
            return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
        }).join(''));
    }
    
    b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
    b64DecodeUnicode('Cg=='); // "\n"

(为什么我们需要这样做?('00' + c.charCodeAt(0).toString(16)).slice(-2) 为单个字符串添加 0,例如当 c == \n 时,c.charCodeAt(0).toString(16) returns a, 强制 a 表示为 0a).


TypeScript 支持

这是具有一些额外的 TypeScript 兼容性的相同解决方案(通过@MA-Maddin):

// Encoding UTF8 ⇢ base64

function b64EncodeUnicode(str) {
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g, function(match, p1) {
        return String.fromCharCode(parseInt(p1, 16))
    }))
}

// Decoding base64 ⇢ UTF8

function b64DecodeUnicode(str) {
    return decodeURIComponent(Array.prototype.map.call(atob(str), function(c) {
        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2)
    }).join(''))
}

第一个解决方案(已弃用)

这使用了 escapeunescape(现已弃用,但在所有现代浏览器中仍然有效):

function utf8_to_b64( str ) {
    return window.btoa(unescape(encodeURIComponent( str )));
}

function b64_to_utf8( str ) {
    return decodeURIComponent(escape(window.atob( str )));
}

// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"

最后一件事:我第一次遇到这个问题是在调用 GitHub API 时。为了让它在(移动)Safari 上正常工作,我实际上不得不从 base64 源 中删除所有白色 space,然后 我什至可以解码源。这在 2021 年是否仍然相关,我不知道:

function b64_to_utf8( str ) {
    str = str.replace(/\s/g, '');    
    return decodeURIComponent(escape(window.atob( str )));
}

小更正,unescape 和 escape 已弃用,所以:

function utf8_to_b64( str ) {
    return window.btoa(decodeURIComponent(encodeURIComponent(str)));
}

function b64_to_utf8( str ) {
     return decodeURIComponent(encodeURIComponent(window.atob(str)));
}


function b64_to_utf8( str ) {
    str = str.replace(/\s/g, '');    
    return decodeURIComponent(encodeURIComponent(window.atob(str)));
}

情况发生了变化。 escape/unescape 方法已被弃用。

您可以在对字符串进行 Base64 编码之前对其进行 URI 编码。请注意,这不会生成 Base64 编码的 UTF8,而是生成 Base64 编码的 URL 编码数据。双方必须同意相同的编码。

在此处查看工作示例:http://codepen.io/anon/pen/PZgbPW

// encode string
var base64 = window.btoa(encodeURIComponent('€ 你好 æøåÆØÅ'));
// decode string
var str = decodeURIComponent(window.atob(tmp));
// str is now === '€ 你好 æøåÆØÅ'

对于 OP 的问题,js-base64 等第三方库应该可以解决问题。

这里有一些 future-proof 可能缺少 escape/unescape() 的浏览器代码。请注意,IE 9 及更早版本不支持 atob/btoa(),因此您需要为它们使用自定义 base64 函数。

// Polyfill for escape/unescape
if( !window.unescape ){
    window.unescape = function( s ){
        return s.replace( /%([0-9A-F]{2})/g, function( m, p ) {
            return String.fromCharCode( '0x' + p );
        } );
    };
}
if( !window.escape ){
    window.escape = function( s ){
        var chr, hex, i = 0, l = s.length, out = '';
        for( ; i < l; i ++ ){
            chr = s.charAt( i );
            if( chr.search( /[A-Za-z0-9\@\*\_\+\-\.\/]/ ) > -1 ){
                out += chr; continue; }
            hex = s.charCodeAt( i ).toString( 16 );
            out += '%' + ( hex.length % 2 != 0 ? '0' : '' ) + hex;
        }
        return out;
    };
}

// Base64 encoding of UTF-8 strings
var utf8ToB64 = function( s ){
    return btoa( unescape( encodeURIComponent( s ) ) );
};
var b64ToUtf8 = function( s ){
    return decodeURIComponent( escape( atob( s ) ) );
};

可以在此处找到更全面的 UTF-8 编码和解码示例:http://jsfiddle.net/47zwb41o/

如果您更喜欢将字符串视为字节,则可以使用以下函数

function u_atob(ascii) {
    return Uint8Array.from(atob(ascii), c => c.charCodeAt(0));
}

function u_btoa(buffer) {
    var binary = [];
    var bytes = new Uint8Array(buffer);
    for (var i = 0, il = bytes.byteLength; i < il; i++) {
        binary.push(String.fromCharCode(bytes[i]));
    }
    return btoa(binary.join(''));
}


// example, it works also with astral plane characters such as ''
var encodedString = new TextEncoder().encode('✓');
var base64String = u_btoa(encodedString);
console.log('✓' === new TextDecoder().decode(u_atob(base64String)))

包括上面的解决方案,如果仍然面临问题,请尝试如下,考虑 TS 不支持转义的情况。

blob = new Blob(["\ufeff", csv_content]); // this will make symbols to appears in excel 

对于 csv_content 你可以像下面这样尝试。

function b64DecodeUnicode(str: any) {        
        return decodeURIComponent(atob(str).split('').map((c: any) => {
            return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
        }).join(''));
    }

这是 Mozilla Development Resources

中描述的 2018 年更新解决方案

从 UNICODE 编码为 B64

function b64EncodeUnicode(str) {
    // first we use encodeURIComponent to get percent-encoded UTF-8,
    // then we convert the percent encodings into raw bytes which
    // can be fed into btoa.
    return btoa(encodeURIComponent(str).replace(/%([0-9A-F]{2})/g,
        function toSolidBytes(match, p1) {
            return String.fromCharCode('0x' + p1);
    }));
}

b64EncodeUnicode('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64EncodeUnicode('\n'); // "Cg=="

从 B64 解码为 UNICODE

function b64DecodeUnicode(str) {
    // Going backwards: from bytestream, to percent-encoding, to original string.
    return decodeURIComponent(atob(str).split('').map(function(c) {
        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
    }).join(''));
}

b64DecodeUnicode('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"
b64DecodeUnicode('Cg=='); // "\n"

我假设有人可能想要一种能够生成广泛使用的 base64 URI 的解决方案。请访问 data:text/plain;charset=utf-8;base64,4pi44pi54pi64pi74pi84pi+4pi/ 观看演示(复制数据 uri,打开新选项卡,将数据 URI 粘贴到地址栏,然后按 enter 进入页面)。尽管这个 URI 是 base64 编码的,浏览器仍然能够识别高代码点并正确解码它们。缩小后的编码器+解码器为1058字节(+Gzip→589字节)

!function(e){"use strict";function h(b){var a=b.charCodeAt(0);if(55296<=a&&56319>=a)if(b=b.charCodeAt(1),b===b&&56320<=b&&57343>=b){if(a=1024*(a-55296)+b-56320+65536,65535<a)return d(240|a>>>18,128|a>>>12&63,128|a>>>6&63,128|a&63)}else return d(239,191,189);return 127>=a?inputString:2047>=a?d(192|a>>>6,128|a&63):d(224|a>>>12,128|a>>>6&63,128|a&63)}function k(b){var a=b.charCodeAt(0)<<24,f=l(~a),c=0,e=b.length,g="";if(5>f&&e>=f){a=a<<f>>>24+f;for(c=1;c<f;++c)a=a<<6|b.charCodeAt(c)&63;65535>=a?g+=d(a):1114111>=a?(a-=65536,g+=d((a>>10)+55296,(a&1023)+56320)):c=0}for(;c<e;++c)g+="\ufffd";return g}var m=Math.log,n=Math.LN2,l=Math.clz32||function(b){return 31-m(b>>>0)/n|0},d=String.fromCharCode,p=atob,q=btoa;e.btoaUTF8=function(b,a){return q((a?"\u00ef\u00bb\u00bf":"")+b.replace(/[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g,h))};e.atobUTF8=function(b,a){a||"\u00ef\u00bb\u00bf"!==b.substring(0,3)||(b=b.substring(3));return p(b).replace(/[\xc0-\xff][\x80-\xbf]*/g,k)}}(""+void 0==typeof global?""+void 0==typeof self?this:self:global)

下面是用于生成它的源代码。

var fromCharCode = String.fromCharCode;
var btoaUTF8 = (function(btoa, replacer){"use strict";
    return function(inputString, BOMit){
        return btoa((BOMit ? "\xEF\xBB\xBF" : "") + inputString.replace(
            /[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g, replacer
        ));
    }
})(btoa, function(nonAsciiChars){"use strict";
    // make the UTF string into a binary UTF-8 encoded string
    var point = nonAsciiChars.charCodeAt(0);
    if (point >= 0xD800 && point <= 0xDBFF) {
        var nextcode = nonAsciiChars.charCodeAt(1);
        if (nextcode !== nextcode) // NaN because string is 1 code point long
            return fromCharCode(0xef/*11101111*/, 0xbf/*10111111*/, 0xbd/*10111101*/);
        // https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
        if (nextcode >= 0xDC00 && nextcode <= 0xDFFF) {
            point = (point - 0xD800) * 0x400 + nextcode - 0xDC00 + 0x10000;
            if (point > 0xffff)
                return fromCharCode(
                    (0x1e/*0b11110*/<<3) | (point>>>18),
                    (0x2/*0b10*/<<6) | ((point>>>12)&0x3f/*0b00111111*/),
                    (0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
                    (0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
                );
        } else return fromCharCode(0xef, 0xbf, 0xbd);
    }
    if (point <= 0x007f) return nonAsciiChars;
    else if (point <= 0x07ff) {
        return fromCharCode((0x6<<5)|(point>>>6), (0x2<<6)|(point&0x3f));
    } else return fromCharCode(
        (0xe/*0b1110*/<<4) | (point>>>12),
        (0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
        (0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
    );
});

然后,要解码 base64 数据,HTTP 获取数据作为数据 URI 或使用下面的函数。

var clz32 = Math.clz32 || (function(log, LN2){"use strict";
    return function(x) {return 31 - log(x >>> 0) / LN2 | 0};
})(Math.log, Math.LN2);
var fromCharCode = String.fromCharCode;
var atobUTF8 = (function(atob, replacer){"use strict";
    return function(inputString, keepBOM){
        inputString = atob(inputString);
        if (!keepBOM && inputString.substring(0,3) === "\xEF\xBB\xBF")
            inputString = inputString.substring(3); // eradicate UTF-8 BOM
        // 0xc0 => 0b11000000; 0xff => 0b11111111; 0xc0-0xff => 0b11xxxxxx
        // 0x80 => 0b10000000; 0xbf => 0b10111111; 0x80-0xbf => 0b10xxxxxx
        return inputString.replace(/[\xc0-\xff][\x80-\xbf]*/g, replacer);
    }
})(atob, function(encoded){"use strict";
    var codePoint = encoded.charCodeAt(0) << 24;
    var leadingOnes = clz32(~codePoint);
    var endPos = 0, stringLen = encoded.length;
    var result = "";
    if (leadingOnes < 5 && stringLen >= leadingOnes) {
        codePoint = (codePoint<<leadingOnes)>>>(24+leadingOnes);
        for (endPos = 1; endPos < leadingOnes; ++endPos)
            codePoint = (codePoint<<6) | (encoded.charCodeAt(endPos)&0x3f/*0b00111111*/);
        if (codePoint <= 0xFFFF) { // BMP code point
          result += fromCharCode(codePoint);
        } else if (codePoint <= 0x10FFFF) {
          // https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
          codePoint -= 0x10000;
          result += fromCharCode(
            (codePoint >> 10) + 0xD800,  // highSurrogate
            (codePoint & 0x3ff) + 0xDC00 // lowSurrogate
          );
        } else endPos = 0; // to fill it in with INVALIDs
    }
    for (; endPos < stringLen; ++endPos) result += "\ufffd"; // replacement character
    return result;
});

更标准的好处是这个编码器和这个解码器的适用范围更广,因为它们可以作为正确显示的有效URL。观察

(function(window){
    "use strict";
    var sourceEle = document.getElementById("source");
    var urlBarEle = document.getElementById("urlBar");
    var mainFrameEle = document.getElementById("mainframe");
    var gotoButton = document.getElementById("gotoButton");
    var parseInt = window.parseInt;
    var fromCodePoint = String.fromCodePoint;
    var parse = JSON.parse;
    
    function unescape(str){
        return str.replace(/\u[\da-f]{0,4}|\x[\da-f]{0,2}|\u{[^}]*}|\[bfnrtv"'\]|\0[0-7]{1,3}|\\d{1,3}/g, function(match){
          try{
            if (match.startsWith("\u{"))
              return fromCodePoint(parseInt(match.slice(2,-1),16));
            if (match.startsWith("\u") || match.startsWith("\x"))
              return fromCodePoint(parseInt(match.substring(2),16));
            if (match.startsWith("\0") && match.length > 2)
              return fromCodePoint(parseInt(match.substring(2),8));
            if (/^\\d/.test(match)) return fromCodePoint(+match.slice(1));
          }catch(e){return "\ufffd".repeat(match.length)}
          return parse('"' + match + '"');
        });
    }
    
    function whenChange(){
      try{ urlBarEle.value = "data:text/plain;charset=UTF-8;base64," + btoaUTF8(unescape(sourceEle.value), true);
      } finally{ gotoURL(); }
    }
    sourceEle.addEventListener("change",whenChange,{passive:1});
    sourceEle.addEventListener("input",whenChange,{passive:1});
    
    // IFrame Setup:
    function gotoURL(){mainFrameEle.src = urlBarEle.value}
    gotoButton.addEventListener("click", gotoURL, {passive: 1});
    function urlChanged(){urlBarEle.value = mainFrameEle.src}
    mainFrameEle.addEventListener("load", urlChanged, {passive: 1});
    urlBarEle.addEventListener("keypress", function(evt){
      if (evt.key === "enter") evt.preventDefault(), urlChanged();
    }, {passive: 1});
    
        
    var fromCharCode = String.fromCharCode;
    var btoaUTF8 = (function(btoa, replacer){
      "use strict";
        return function(inputString, BOMit){
         return btoa((BOMit?"\xEF\xBB\xBF":"") + inputString.replace(
          /[\x80-\uD7ff\uDC00-\uFFFF]|[\uD800-\uDBFF][\uDC00-\uDFFF]?/g, replacer
      ));
     }
    })(btoa, function(nonAsciiChars){
  "use strict";
     // make the UTF string into a binary UTF-8 encoded string
     var point = nonAsciiChars.charCodeAt(0);
     if (point >= 0xD800 && point <= 0xDBFF) {
      var nextcode = nonAsciiChars.charCodeAt(1);
      if (nextcode !== nextcode) { // NaN because string is 1code point long
       return fromCharCode(0xef/*11101111*/, 0xbf/*10111111*/, 0xbd/*10111101*/);
      }
      // https://mathiasbynens.be/notes/javascript-encoding#surrogate-formulae
      if (nextcode >= 0xDC00 && nextcode <= 0xDFFF) {
       point = (point - 0xD800) * 0x400 + nextcode - 0xDC00 + 0x10000;
       if (point > 0xffff) {
        return fromCharCode(
         (0x1e/*0b11110*/<<3) | (point>>>18),
         (0x2/*0b10*/<<6) | ((point>>>12)&0x3f/*0b00111111*/),
         (0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
         (0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
        );
       }
      } else {
       return fromCharCode(0xef, 0xbf, 0xbd);
      }
     }
     if (point <= 0x007f) { return inputString; }
     else if (point <= 0x07ff) {
      return fromCharCode((0x6<<5)|(point>>>6), (0x2<<6)|(point&0x3f/*00111111*/));
     } else {
      return fromCharCode(
       (0xe/*0b1110*/<<4) | (point>>>12),
       (0x2/*0b10*/<<6) | ((point>>>6)&0x3f/*0b00111111*/),
       (0x2/*0b10*/<<6) | (point&0x3f/*0b00111111*/)
      );
     }
    });
    setTimeout(whenChange, 0);
})(window);
img:active{opacity:0.8}
<center>
<textarea id="source" style="width:66.7vw">Hello \u1234 W656ld!
Enter text into the top box. Then the URL will update automatically.
</textarea><br />
<div style="width:66.7vw;display:inline-block;height:calc(25vw + 1em + 6px);border:2px solid;text-align:left;line-height:1em">
<input id="urlBar" style="width:calc(100% - 1em - 13px)" /><img id="gotoButton" src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABsAAAAeCAMAAADqx5XUAAAAclBMVEX///9NczZ8e32ko6fDxsU/fBoSQgdFtwA5pAHVxt+7vLzq5ex23y4SXABLiiTm0+/c2N6DhoQ6WSxSyweVlZVvdG/Uz9aF5kYlbwElkwAggACxs7Jl3hX07/cQbQCar5SU9lRntEWGum+C9zIDHwCGnH5IvZAOAAABmUlEQVQoz7WS25acIBBFkRLkIgKKtOCttbv//xdDmTGZzHv2S63ltuBQQP4rdRiRUP8UK4wh6nVddQwj/NtDQTvac8577zTQb72zj65/876qqt7wykU6/1U6vFEgjE1mt/5LRqrpu7oVsn0sjZejMfxR3W/yLikqAFcUx93YxLmZGOtElmEu6Ufd9xV3ZDTGcEvGLbMk0mHHlUSvS5svCwS+hVL8loQQyfpI1Ay8RF/xlNxcsTchGjGDIuBG3Ik7TMyNxn8m0TSnBAK6Z8UZfp3IbAonmJvmsEACum6aNv7B0CnvpezDcNhw9XWsuAr7qnRg6dABmeM4dTgn/DZdXWs3LMspZ1KDMt1kcPJ6S1icWNp2qaEmjq6myx7jbQK3VKItLJaW5FR+cuYlRhYNKzGa9vF4vM5roLW3OSVjkmiGJrPhUq301/16pVKZRGFYWjTP50spTxBN5Z4EKnSonruk+n4tUokv1aJSEl/MLZU90S3L6/U6o0J142iQVp3HcZxKSo8LfkNRCtJaKYFSRX7iaoAAUDty8wvWYR6HJEepdwAAAABJRU5ErkJggg==" style="width:calc(1em + 4px);line-height:1em;vertical-align:-40%;cursor:pointer" />
<iframe id="mainframe" style="width:66.7vw;height:25vw" frameBorder="0"></iframe>
</div>
</center>

上面的代码片段除了非常规范之外,速度也非常快。上面的代码片段在性能上尽可能直接,而不是数据必须在各种形式之间多次转换的间接链(例如在 Riccardo Galli 的响应中)。它在编码时仅使用一个简单的快速 String.prototype.replace 调用来处理数据,在解码时仅使用一个来解码数据。另一个优点是(特别是对于大字符串),String.prototype.replace 允许浏览器自动处理调整字符串大小的底层内存管理,从而显着提高性能,尤其是在 Chrome 和 Firefox 这样的常青浏览器中优化 String.prototype.replace。最后,锦上添花的是,对于拉丁脚本 exclūsīvō 用户来说,不包含 0x7f 以上任何代码点的字符串处理起来特别快,因为该字符串保持未被替换算法修改。

我已经在 https://github.com/anonyco/BestBase64EncoderDecoder/

为这个解决方案创建了一个 github 存储库

适合我的完整文章:https://developer.mozilla.org/en-US/docs/Web/JavaScript/Base64_encoding_and_decoding

我们从Unicode/UTF-8编码的部分是

function utf8_to_b64( str ) {
   return window.btoa(unescape(encodeURIComponent( str )));
}

function b64_to_utf8( str ) {
   return decodeURIComponent(escape(window.atob( str )));
}

// Usage:
utf8_to_b64('✓ à la mode'); // "4pyTIMOgIGxhIG1vZGU="
b64_to_utf8('4pyTIMOgIGxhIG1vZGU='); // "✓ à la mode"

这是现在最常用的方法之一。

将 base64 解码为 UTF8 字符串

以下是@brandonscript 当前投票最多的答案

function b64DecodeUnicode(str) {
    // Going backwards: from bytestream, to percent-encoding, to original string.
    return decodeURIComponent(atob(str).split('').map(function(c) {
        return '%' + ('00' + c.charCodeAt(0).toString(16)).slice(-2);
    }).join(''));
}

以上代码可以运行,但速度很慢。如果您的输入是一个非常大的 base64 字符串,例如 base64 html 文档有 30,000 个字符。这将需要大量的计算。

这是我的答案,使用 built-in TextDecoder,对于大输入,比上面的代码快近 10 倍。

function decodeBase64(base64) {
    const text = atob(base64);
    const length = text.length;
    const bytes = new Uint8Array(length);
    for (let i = 0; i < length; i++) {
        bytes[i] = text.charCodeAt(i);
    }
    const decoder = new TextDecoder(); // default is utf-8
    return decoder.decode(bytes);
}