与 RTL 语言一起使用时字符串替换函数调用的顺序

Question

使用替换函数调用String.replace时，我们能够检索匹配子字符串的偏移量。

var a = [];
"hello world".replace(/l/g, function (m, i) { a.push(i); });
// a = [2, 3, 9]

在上面的示例中，我们获取了匹配 l 个字符的偏移量列表。

我能否指望实现始终按出现的升序调用匹配函数，即使使用从右到左书写的语言？

也就是说：我能确定上面的结果总是 [2,3,9] 而不是 [3,9,2] 或这些偏移量的任何其他排列吗？

这是对 that Tomalak 的后续回答：

Absolutely, yes. Matches are handled from left to right in the source string because left-to-right is how regular expression engines work their way to a string.

然而，关于 RTL 语言的情况，他还说：

That's a good question [...] RTL text definitely changes how JavaScript regular expressions behave.

我在 Chrome 中使用以下 RTL 片段进行了测试：

var a = [];
"بلوچی مکرانی".replace(/ی/g, function (m, i) { a.push(i); });
// a = [4, 11]

我不会说那种语言，但在查看字符串时，我看到 ی 字符是字符串的第一个字符，也是白色 space 之后的第一个字符。但是，由于文本是从右到左书写的，因此这些位置实际上是白色之前的最后一个字符 space 和 字符串中的最后一个字符 - 转换为 [4,11]

因此，这似乎与 Chrome 中的预期一样有效。问题是：我能相信所有符合要求的 javascript 实现的结果都是一样的吗？

Answer 1

ECMA-262 5.1 Edition/June 2011 我用"format control", "right to left" 和 "RTL" 关键字搜索过，没有提到它们，除了它说字符串文字和正则表达式文字中允许使用格式控制字符。

来自第 7.1 节

It is useful to allow format-control characters in source text to facilitate editing and display. All format control characters may be used within comments, and within string literals and regular expression literals.

附件 E

7.1: Unicode format control characters are no longer stripped from ECMAScript source text before processing. In Edition 5, if such a character appears in a StringLiteral or RegularExpressionLiteral the character will be incorporated into the literal where in Edition 3 the character would not be incorporated into the literal

据此，我得出结论 JavaScript 对从右到左的字符的操作没有任何不同。它只知道存储在字符串中的 UTF-16 代码单元，并基于 logical order.

工作

与 RTL 语言一起使用时字符串替换函数调用的顺序

Order of string replacement function invocations when used with RTL languages

javascript

regex

right-to-left