如何 return 雪花中只有一个正则表达式匹配组？

Question

我有一个包含多个匹配组的正则表达式。

如何在 snowflake 中指定要 return 的匹配组？

我正在使用 REGEXP_SUBSTR 但很乐意使用更好的替代品。

Answer 1

在 the documentation 中有一个名为 occurance 的参数，它允许您指定与 return.

的匹配项

例如：

select regexp_substr('bird is the word','\w+',1,1); -- returns "bird"
select regexp_substr('bird is the word','\w+',1,4); -- returns "word"

Answer 2

TL;DR：不能完全那样做，但是你可以'e'选项并使用非捕获组(?:re).

所以澄清一下，Neil 似乎要求的东西 return word

select regexp_substr('bird is the word','(bird) (is) (the) (word)',1,4)

不幸的是，我不认为 Snowflake 今天完全支持此功能。 REGEXP_SUBSTR 有一个 'e'（提取）参数，它允许您只提取一个组，但它总是首先提取组。原因是今天的 occurrence 参数意味着 整个正则表达式在字符串 中出现。例子

select regexp_substr('bird is cows are','([a-z]*) (is|are)',1,2,'e');
=> cows

你可以通过在你想要的之前不对组使用分组来实现你想要的，例如

select regexp_substr('bird is the word','bird (is) (the) (word)',1,1,'e');
-> is
select regexp_substr('bird is the word','bird is the (word)',1,1,'e');
-> word

但是，如果您想使用分组来表达备选方案，那将不起作用，例如

select regexp_substr('cow is the word','(bird|cow) is the (word)',1,1,'e');
-> cow

不过，我认为提供提取特定组号的选项是有价值的，将通过 Snowflake 开发提高它:)

如何 return 雪花中只有一个正则表达式匹配组？

How to return only a single regex match group in snowflake?

snowflake-cloud-data-platform