如何在 Nim 中捕获正则表达式的一部分

Question

我想从“some text :some_token”文本中提取“some_token”。

下面的代码 return 是完全匹配 ' :some_token' 而不是捕获的部分 'some_token' 标记为 ([a-z0-9_-]+)。

import re

let expr = re("\s:([a-z0-9_-]+)$", flags = {re_study, re_ignore_case})
for match in "some text :some_token".find_bounds(expr):
  echo "'" & match & "'"

如何修改为return只有截取的部分？

P.S.

此外，re 和 nre 模块有什么区别？

Answer 1

提交的代码无法编译（find_bounds returns 一个 tuple[first, last: int] 而不是可以用 for 迭代的代码）。不过，在该示例中 find_bounds 确实会给出整个模式的索引范围，而不是捕获子字符串。

以下 (https://play.nim-lang.org/#ix=2yvs) 用于提供捕获的字符串：

import re

let expr = re("\s:([a-z0-9_-]+)$", flags = {re_study, re_ignore_case})
var matches: array[1, string]
if "some text :some_token".find(expr, matches) >= 0:
  echo matches  # -> ["some_token"]

请注意，在上面的 matches 中，捕获组的长度必须正确（除非您指定正确的长度，否则使用序列将不起作用）。这是 re 的已知问题：https://github.com/nim-lang/Nim/issues/9472

关于re和nre的双重存在，总结自this discussion：

nre 与 re（更接近 C API）
nre 过去的问题比 re 少，但最近差距已经缩小（另请参阅 open regex issues）
可能以后nre可能会从stdlib中移出并放入一个灵活的包中，但是由于这在v1中没有发生过，因此在v2之前可能不会发生
请注意，正则表达式 (nim-regex) 有一个纯 nim 实现，它也符合人体工程学 API。

如何在 Nim 中捕获正则表达式的一部分

How to get Captured Part of the Regular Expression in Nim

regex

nim-lang