从 Hive 的字符串字段中多次出现的某些字符后提取数字
Extracting digits after certain characters that appear more than once from a string field in Hive
我正在尝试提取出现在 'dd ->'
之后的所有数字
我已经弄清楚如何提取 'dd ->' 之后首次出现的数字:regexp_extract(string, 'dd\s->\s([0-9]+)')
以及如何替换除数字以外的所有字符
regexp_replace(string, '[^0-9]+', '')
但未能找到解决方案
字符串:
(dd -> 2192, bar -> 1), (dd -> 2670, bar -> 1), (dd -> 2487, bar -> 3),(dd -> 2346, bar -> 3) kk=67457 ghyt=1628 nn=8.67.1
期望的输出:
2192 2670 2487 2346
谢谢!
使用
dd ->( [0-9]+)|.
替换为 </code>.</p>
<p>见<a href="https://regex101.com/r/ZIEjzK/1" rel="nofollow noreferrer">regex proof</a>。</p>
<p><strong>解释</strong></p>
<pre><code>--------------------------------------------------------------------------------
dd -> 'dd ->'
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
' '
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
. any character except \n
Trim 如果需要,第一个 space。
我正在尝试提取出现在 'dd ->'
之后的所有数字我已经弄清楚如何提取 'dd ->' 之后首次出现的数字:regexp_extract(string, 'dd\s->\s([0-9]+)')
以及如何替换除数字以外的所有字符
regexp_replace(string, '[^0-9]+', '')
但未能找到解决方案
字符串:
(dd -> 2192, bar -> 1), (dd -> 2670, bar -> 1), (dd -> 2487, bar -> 3),(dd -> 2346, bar -> 3) kk=67457 ghyt=1628 nn=8.67.1
期望的输出:
2192 2670 2487 2346
谢谢!
使用
dd ->( [0-9]+)|.
替换为 </code>.</p>
<p>见<a href="https://regex101.com/r/ZIEjzK/1" rel="nofollow noreferrer">regex proof</a>。</p>
<p><strong>解释</strong></p>
<pre><code>--------------------------------------------------------------------------------
dd -> 'dd ->'
--------------------------------------------------------------------------------
( group and capture to :
--------------------------------------------------------------------------------
' '
--------------------------------------------------------------------------------
[0-9]+ any character of: '0' to '9' (1 or more
times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
. any character except \n
Trim 如果需要,第一个 space。