正则表达式组中问号和 x 的含义

Question

我正在尝试学习 Atom 的语法 highlighting/grammar 规则，它大量使用 JS 正则表达式，并且在 python grammar file.

中遇到了一个不熟悉的模式

模式以 (?x) 开头，这对我来说是一个不熟悉的正则表达式。我在一个online regex tester里查了一下，好像说是无效的。我最初的想法是它代表一个可选的左括号，但我认为应该在这里转义括号。

这是否仅在 Atom 的 coffeescript 语法中有意义，还是我忽略了正则表达式的含义？

（此模式也出现在 textmate language 文件中，我相信 Atom 的文件来自该文件）。

Answer 1

JavaScript 正则表达式引擎不支持 VERBOSE 修饰符 x，既不是内联的，也不是常规的。

请参阅 rexegg.com 处的 Free-Spacing: x (except JavaScript)：

By default, any space in a regex string specifies a character to be matched. In languages where you can write regex strings on multiple lines, the line breaks also specify literal characters to be matched. Because you cannot insert spaces to separate groups that carry different meanings (as you do between phrases and pragraphs when you write in English), a regex can become hard to read...

Luckily, many engines support a free-spacing mode that allows you to aerate your regex. For instance, you can add spaces between the tokens.

You may also see it called whitespace mode, comment mode or verbose mode.

这里是how it can look like in Python:

import re
regex = r"""(?x)
\d+                # Digits
\D+                # Non-digits up to...
$                  # The end of string
"""
print(re.search(regex, "My value: 56%").group(0)) # => 56%

Answer 2

如果正则表达式在 Python 中得到处理，它将使用 'verbose' 标志进行编译。

来自the Python re docs：

(?aiLmsux)

(One or more letters from the set 'a', 'i', 'L', 'm', 's', 'u', 'x'.) The group matches the empty string; the letters set the corresponding flags: re.A (ASCII-only matching), re.I (ignore case), re.L (locale dependent), re.M (multi-line), re.S (dot matches all), and re.X (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to the re.compile() function.

正则表达式组中问号和 x 的含义

Meaning of question mark and x in a regex group

regex

textmate

syntax-highlighting

atom-editor

(?aiLmsux)