如何使用 re2 正则表达式提取单个句号之间的所有文本（即忽略 `...`）？

Question

如何使用 re2 正则表达式提取单个句号之间的所有文本（即忽略 ...）？

我在使用 https://github.com/google/re2/wiki/Syntax.

的 BigQuery 中使用 REGEXP_EXTRACT_ALL 函数

来自以下示例：

This is... a.. sentence. It is just an example.

我想提取查询

This is... a.. sentence. 和 It is just an example.

我特别感兴趣的是这是否可以使用 BigQuery 中的 SQL 函数而不是引入另一个工具

Answer 1

考虑以下解决方法

select text, regexp_replace(sentence, r'(#)(\.+)(#)', r'') sentence
from `project.dataset.table`, 
unnest(split(trim(regexp_replace(regexp_replace(text, r'(\.+)', r'##'), r'(\#\.\#)', r'####'), '####'), '####')) sentence

如果应用于您问题中的示例数据 - 输出为

如何使用 re2 正则表达式提取单个句号之间的所有文本（即忽略 `...`）？

How can I extract all text between single full stops (i.e. ignore `...`) with a re2 regex?

regex

sql

re2

google-bigquery