NFKD 是否通过兼容性和规范等价来分解字符?
Does NFKD decompose characters by compatibility and also canonical equivalence?
在多次尝试理解之后,我得说我不明白 String.prototype.normalize()
是如何工作的。此方法可以采用一些值作为参数:NFC
、NFD
、NFKC
、NFKD
.
首先,我不明白 NFD
和 NFKD
之间有什么区别。规范对此非常模糊,所以......在一些 resource 中,我读到 NFD
通过规范等价分解字符。例如:
"â" (U+00E2) -> "a" (U+0061) + " ̂" (U+0302)
和NFKD
通过兼容性分解字符。例如:
"fi" (U+FB01) -> "f" (U+0066) + "i" (U+0069)
但事实并非如此。 NFKD
不仅通过兼容性分解字符。它也可以完美地处理第一个例子:
let s = `\u00E2`; //"â"
console.log(s.normalize('NFD').length); //2
console.log(s.normalize('NFKD').length); //2
是否意味着NFKD
可以通过兼容性和规范等价来分解字符? NFD
仅通过规范等价分解字符...?
let s = `\uFB01`; //"fi"
console.log(s.normalize('NFD').length); //1
The type of full decomposition chosen depends on which Unicode
Normalization Form is involved. For NFC or NFD, one does a full
canonical decomposition, which makes use of only canonical
Decomposition_Mapping values. For NFKC or NFKD, one does a full
compatibility decomposition, which makes use of canonical and
compatibility Decomposition_Mapping values.
这就是 NFC/NFD 和 NFKC/NFKD 如此工作的原因:
let s1 = '\uFB00'; //"ff"
let s2 = '\u0066\u0066'; //"ff"
console.log(s1.normalize('NFD').length); //doesn't work with compatible -- only can. eq.
let t1 = `\u00F4`; //ô
let t2 = `\u006F\u0302`; //ô
console.log(t1.normalize('NFKD').length); //also works with can. eq.
console.log(t2.normalize('NFKC').length); //also works with can. eq.
这是完全可以理解的,因为...
All canonically equivalent sequences are also compatible, but not vice versa.
在多次尝试理解之后,我得说我不明白 String.prototype.normalize()
是如何工作的。此方法可以采用一些值作为参数:NFC
、NFD
、NFKC
、NFKD
.
首先,我不明白 NFD
和 NFKD
之间有什么区别。规范对此非常模糊,所以......在一些 resource 中,我读到 NFD
通过规范等价分解字符。例如:
"â" (U+00E2) -> "a" (U+0061) + " ̂" (U+0302)
和NFKD
通过兼容性分解字符。例如:
"fi" (U+FB01) -> "f" (U+0066) + "i" (U+0069)
但事实并非如此。 NFKD
不仅通过兼容性分解字符。它也可以完美地处理第一个例子:
let s = `\u00E2`; //"â"
console.log(s.normalize('NFD').length); //2
console.log(s.normalize('NFKD').length); //2
是否意味着NFKD
可以通过兼容性和规范等价来分解字符? NFD
仅通过规范等价分解字符...?
let s = `\uFB01`; //"fi"
console.log(s.normalize('NFD').length); //1
The type of full decomposition chosen depends on which Unicode Normalization Form is involved. For NFC or NFD, one does a full canonical decomposition, which makes use of only canonical Decomposition_Mapping values. For NFKC or NFKD, one does a full compatibility decomposition, which makes use of canonical and compatibility Decomposition_Mapping values.
这就是 NFC/NFD 和 NFKC/NFKD 如此工作的原因:
let s1 = '\uFB00'; //"ff"
let s2 = '\u0066\u0066'; //"ff"
console.log(s1.normalize('NFD').length); //doesn't work with compatible -- only can. eq.
let t1 = `\u00F4`; //ô
let t2 = `\u006F\u0302`; //ô
console.log(t1.normalize('NFKD').length); //also works with can. eq.
console.log(t2.normalize('NFKC').length); //also works with can. eq.
这是完全可以理解的,因为...
All canonically equivalent sequences are also compatible, but not vice versa.