从 Hive 的字符串字段中多次出现的某些字符后提取数字

Question

我正在尝试提取出现在 'dd ->'

之后的所有数字

我已经弄清楚如何提取 'dd ->' 之后首次出现的数字：regexp_extract(string, 'dd\s->\s([0-9]+)') 以及如何替换除数字以外的所有字符 regexp_replace(string, '[^0-9]+', '') 但未能找到解决方案

字符串： (dd -> 2192, bar -> 1), (dd -> 2670, bar -> 1), (dd -> 2487, bar -> 3),(dd -> 2346, bar -> 3) kk=67457 ghyt=1628 nn=8.67.1

期望的输出： 2192 2670 2487 2346

谢谢！

Answer 1

使用

dd ->( [0-9]+)|.

替换为 </code>. 见<a href="https://regex101.com/r/ZIEjzK/1" rel="nofollow noreferrer">regex proof</a>。 解释 <pre><code>-------------------------------------------------------------------------------- dd -> 'dd ->' -------------------------------------------------------------------------------- ( group and capture to : -------------------------------------------------------------------------------- ' ' -------------------------------------------------------------------------------- [0-9]+ any character of: '0' to '9' (1 or more times (matching the most amount possible)) -------------------------------------------------------------------------------- ) end of -------------------------------------------------------------------------------- | OR -------------------------------------------------------------------------------- . any character except \n

Trim 如果需要，第一个 space。

从 Hive 的字符串字段中多次出现的某些字符后提取数字

Extracting digits after certain characters that appear more than once from a string field in Hive

regex

hiveql

regexp-replace