JS 字符串中的行尾(也称为换行符)
Line endings (also known as Newlines) in JS strings
众所周知,类 Unix 系统使用 LF
个字符作为换行符,而 Windows 使用 CR+LF
。
然而,当我在我的 Windows PC 上从本地 HTML 文件测试这段代码时,似乎 JS 将所有换行符都视为用 LF
分隔。这是正确的假设吗?
var string = `
foo
bar
`;
// There should be only one blank line between foo and bar.
// \n - Works
// string = string.replace(/^(\s*\n){2,}/gm, '\n');
// \r\n - Doesn't work
string = string.replace(/^(\s*\r\n){2,}/gm, '\r\n');
alert(string);
// That is, it seems that JS treat all newlines as separated with
// `LF` instead of `CR+LF`?
您可以使用正则表达式:/^\s*[\r\n]/gm
代码示例:
let string = `
foo
bar
`;
string = string.replace(/^\s*[\r\n]/gm, '\r\n');
console.log(string);
我想我找到了解释。
您正在使用 ES6 Template Literal 构建多行字符串。
根据ECMAScript specs一个
.. template literal component is interpreted as a sequence of Unicode
code points. The Template Value (TV) of a literal component is
described in terms of code unit values (SV, 11.8.4) contributed by the
various parts of the template literal component. As part of this
process, some Unicode code points within the template component are
interpreted as having a mathematical value (MV, 11.8.3). In
determining a TV, escape sequences are replaced by the UTF-16 code
unit(s) of the Unicode code point represented by the escape sequence.
The Template Raw Value (TRV) is similar to a Template Value with the
difference that in TRVs escape sequences are interpreted literally.
在其下方,定义为:
The TRV of LineTerminatorSequence::<LF> is the code unit 0x000A (LINE
FEED).
The TRV of LineTerminatorSequence::<CR> is the code unit 0x000A (LINE FEED).
我的解释是,当您使用模板文字时,无论 OS 特定的换行定义如何,您总是只会得到一个换行符。
最后,在JavaScript's regular expressions一个
\n matches a line feed (U+000A).
描述观察到的行为。
但是,如果您定义字符串文字 '\r\n'
或从文件流中读取包含 OS 特定换行符的文本,您必须处理它。
以下是一些演示模板文字行为的测试:
`a
b`.split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
(String.raw`a
b`).split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
'a\r\nb'.split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
"a\
b".split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
解读结果:
字符(97) = a
, 字符(98) = b
字符(10) = \n
, 字符(13) = \r
众所周知,类 Unix 系统使用 LF
个字符作为换行符,而 Windows 使用 CR+LF
。
然而,当我在我的 Windows PC 上从本地 HTML 文件测试这段代码时,似乎 JS 将所有换行符都视为用 LF
分隔。这是正确的假设吗?
var string = `
foo
bar
`;
// There should be only one blank line between foo and bar.
// \n - Works
// string = string.replace(/^(\s*\n){2,}/gm, '\n');
// \r\n - Doesn't work
string = string.replace(/^(\s*\r\n){2,}/gm, '\r\n');
alert(string);
// That is, it seems that JS treat all newlines as separated with
// `LF` instead of `CR+LF`?
您可以使用正则表达式:/^\s*[\r\n]/gm
代码示例:
let string = `
foo
bar
`;
string = string.replace(/^\s*[\r\n]/gm, '\r\n');
console.log(string);
我想我找到了解释。
您正在使用 ES6 Template Literal 构建多行字符串。
根据ECMAScript specs一个
.. template literal component is interpreted as a sequence of Unicode code points. The Template Value (TV) of a literal component is described in terms of code unit values (SV, 11.8.4) contributed by the various parts of the template literal component. As part of this process, some Unicode code points within the template component are interpreted as having a mathematical value (MV, 11.8.3). In determining a TV, escape sequences are replaced by the UTF-16 code unit(s) of the Unicode code point represented by the escape sequence. The Template Raw Value (TRV) is similar to a Template Value with the difference that in TRVs escape sequences are interpreted literally.
在其下方,定义为:
The TRV of LineTerminatorSequence::<LF> is the code unit 0x000A (LINE FEED).
The TRV of LineTerminatorSequence::<CR> is the code unit 0x000A (LINE FEED).
我的解释是,当您使用模板文字时,无论 OS 特定的换行定义如何,您总是只会得到一个换行符。
最后,在JavaScript's regular expressions一个
\n matches a line feed (U+000A).
描述观察到的行为。
但是,如果您定义字符串文字 '\r\n'
或从文件流中读取包含 OS 特定换行符的文本,您必须处理它。
以下是一些演示模板文字行为的测试:
`a
b`.split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
(String.raw`a
b`).split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
'a\r\nb'.split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
"a\
b".split('')
.map(function (char) {
console.log(char.charCodeAt(0));
});
解读结果:
字符(97) = a
, 字符(98) = b
字符(10) = \n
, 字符(13) = \r