`toLowerCase` 可以改变 JavaScript 字符串的长度吗?
Can `toLowerCase` change a JavaScript string's length?
有没有可能string.length !== string.toLowerCase().length
?
根据 Change to .length with toUpperCase? 的答案,我知道 toUpperCase
是可能的,但我不知道 toLowerCase
.
是否可能
是的,有可能,并非所有大小写映射都是一对一的,它们也可以是一对多的,正如您在从小写到大写时看到的那样,从大写到小写时同样适用.这可以通过 带点的拉丁文大写字母 I 字符 (0x0130):
看出
const char = "\u0130"; // İ
const charLower = char.toLowerCase();
console.log(char, char.length); // İ 1
console.log(charLower, charLower.length); // i̇ 2
console.log(char.length !== charLower.length); // true
规范中的注释也在 22.1.3.26 部分强调了此行为:
The case mapping of some code points may produce multiple code points.
In this case the result String may not be the same length as the
source String
可以在 Unicode Character Database (UCD) 中找到特殊情况映射列表(即:不一定是一对一的情况映射)。如所列,某些字符的长度仅在特定条件下才会增加,其中一些取决于特定的上下文和语言环境:
// From the UCD for SpecialCasings, another example can be found:
// 012E; 012F 0307; 012E; 012E; lt More_Above; # LATIN CAPITAL LETTER I WITH OGONEK
// The above means '012E' maps to the lower case of '012F 0307' if:
// - the locale is Lithuanian (lt)
// - the suffix contains a character of combining class 230 (Above)
// \u0300 is a character with such a combining class value (list found here: https://www.compart.com/en/unicode/combining/230)
const grapheme = '\u012E\u0300'; // Į̀ (Į + ̀ )
console.log(grapheme, grapheme.length); // Į̀ 2
const lowerStd = grapheme.toLowerCase();
console.log(lowerStd, lowerStd.length); // į̀ 2 (still fine)
const lowerLocale = grapheme.toLocaleLowerCase('lt');
console.log(lowerLocale, lowerLocale.length); // į̇̀ 3 (now 3 when using lt as the locale)
有没有可能string.length !== string.toLowerCase().length
?
根据 Change to .length with toUpperCase? 的答案,我知道 toUpperCase
是可能的,但我不知道 toLowerCase
.
是的,有可能,并非所有大小写映射都是一对一的,它们也可以是一对多的,正如您在从小写到大写时看到的那样,从大写到小写时同样适用.这可以通过 带点的拉丁文大写字母 I 字符 (0x0130):
看出const char = "\u0130"; // İ
const charLower = char.toLowerCase();
console.log(char, char.length); // İ 1
console.log(charLower, charLower.length); // i̇ 2
console.log(char.length !== charLower.length); // true
规范中的注释也在 22.1.3.26 部分强调了此行为:
The case mapping of some code points may produce multiple code points. In this case the result String may not be the same length as the source String
可以在 Unicode Character Database (UCD) 中找到特殊情况映射列表(即:不一定是一对一的情况映射)。如所列,某些字符的长度仅在特定条件下才会增加,其中一些取决于特定的上下文和语言环境:
// From the UCD for SpecialCasings, another example can be found:
// 012E; 012F 0307; 012E; 012E; lt More_Above; # LATIN CAPITAL LETTER I WITH OGONEK
// The above means '012E' maps to the lower case of '012F 0307' if:
// - the locale is Lithuanian (lt)
// - the suffix contains a character of combining class 230 (Above)
// \u0300 is a character with such a combining class value (list found here: https://www.compart.com/en/unicode/combining/230)
const grapheme = '\u012E\u0300'; // Į̀ (Į + ̀ )
console.log(grapheme, grapheme.length); // Į̀ 2
const lowerStd = grapheme.toLowerCase();
console.log(lowerStd, lowerStd.length); // į̀ 2 (still fine)
const lowerLocale = grapheme.toLocaleLowerCase('lt');
console.log(lowerLocale, lowerLocale.length); // į̇̀ 3 (now 3 when using lt as the locale)