Oracle REGEXP_SUBSTR 解析美元金额
Oracle REGEXP_SUBSTR Parse Dollar Amounts
我正在尝试从字符串中解析美元金额。
示例字符串:
- *SOC 1369.00 - NCS 1239.46 = PT LIAB 129.54
- *SOC 1369.00 - NCS 1239.46 = PT LIAB 140
- *SOC = 1178.00
- *SOC 1622.00 - NCS 209.74 = PT LIAB 1412.26 收件人年龄
- *第 1 行 SOC 0.00 - NCS 22.77 = LIAB -22.77
- SOC 满足并清除,SOC 2062-NCS 498.56=PT 责任 1563.44
- *SOC 1622.00 - NCS 209.74 = PT LIAB 1412 收件人年龄 1234
我想拉patient liability,就是dollar amount following text "LIAB," "PT LIAB," or "LIABLE." dollar amount可以是负数,可以有也可以没有一个小数。
我的解决方案:
REPLACE(REPLACE(REGEXP_SUBSTR(REMARKS,'LIAB+[LE]?+ (-?+\d+[.]?)+\d'),'LIAB ',''),'LIABLE ','')
这似乎有点笨拙,我认为有更简单的解决方案。任何指导将不胜感激!
我正在使用 Toad for Oracle 12.8。
你几乎可以通过 REGEXP_REPLACE()
和反向引用到达那里:
REGEXP_REPLACE(REMARKS,'.*(PT LIAB|LIAB|LIABLE) (-?\d+[.]?\d+).*', '')
... 但是它通过未触及的方式传递了一个没有匹配模式的值(所以你的第三个例子仍然会得到 *SOC = 1178.00
。你可以使用 case 表达式和 REGEXP_LIKE()
来避免这种情况:
with t (remarks) as (
select '*SOC 1369.00 - NCS 1239.46 = PT LIAB 129.54' from dual
union all select '*SOC 1369.00 - NCS 1239.46 = PT LIAB 140' from dual
union all select '*SOC = 1178.00' from dual
union all select '*SOC 1622.00 - NCS 209.74 = PT LIAB 1412.26 RECIPIENT AGE' from dual
union all select '*LINE #1 SOC 0.00 - NCS 22.77 = LIAB -22.77' from dual
union all select 'SOC MET AND CLEARED, SOC 2062-NCS 498.56=PT LIABLE 1563.44' from dual
)
SELECT REMARKS,
CASE WHEN REGEXP_LIKE(REMARKS, '.*(PT LIAB|LIAB|LIABLE) (-?\d+[.]?\d+).*')
THEN REGEXP_REPLACE(REMARKS,'.*(PT LIAB|LIAB|LIABLE) (-?\d+([.]\d+)?).*', '')
END as liability
from t;
REMARKS LIABILITY
---------------------------------------------------------- ----------
*SOC 1369.00 - NCS 1239.46 = PT LIAB 129.54 129.54
*SOC 1369.00 - NCS 1239.46 = PT LIAB 140 140
*SOC = 1178.00
*SOC 1622.00 - NCS 209.74 = PT LIAB 1412.26 RECIPIENT AGE 1412.26
*LINE #1 SOC 0.00 - NCS 22.77 = LIAB -22.77 -22.77
SOC MET AND CLEARED, SOC 2062-NCS 498.56=PT LIABLE 1563.44 1563.44
但这似乎并没有好多少,并且使用两次正则表达式会使它变得更加昂贵。 (它可能还可以简化......)。您还可以使用 REGEXP_REPLACE()
而不是两个普通的 REPLACE()
调用:
REGEXP_REPLACE(
REGEXP_SUBSTR(REMARKS, '(PT LIAB|LIAB|LIABLE) (-?\d+([.]\d+)?)'),
'(PT LIAB|LIAB|LIABLE) ')
但这又让它变得更贵了。
试一试:
SQL> with tbl(rownbr, remarks) as (
select 1, '*SOC 1369.00 - NCS 1239.46 = PT LIAB 129.54' from dual union
select 2, '*SOC 1369.00 - NCS 1239.46 = PT LIAB 140' from dual union
select 3, '*SOC = 1178.00' from dual union
select 4, '*SOC 1622.00 - NCS 209.74 = PT LIAB 1412.26 RECIPIENT AGE' from dual union
select 5, '*LINE #1 SOC 0.00 - NCS 22.77 = LIAB -22.77' from dual union
select 6, 'SOC MET AND CLEARED, SOC 2062-NCS 498.56=PT LIABLE 1563.44' from dual union
select 7, '*SOC 1622.00 - NCS 209.74 = PT LIAB 1412 RECIPIENT AGE 1234' from dual
)
select rownbr,
case
when remarks = regexp_replace(remarks, '.*((LIAB|LIABLE) ([-.0-9]+)).*$', '') then
'0' -- regexp_replace returns the orig string if the pattern is not found.
else
regexp_replace(remarks, '.*((LIAB|LIABLE) ([-.0-9]+)).*$', '')
end patient_liability
from tbl;
ROWNBR PATIENT_LIABILITY
---------- -------------------------
1 129.54
2 140
3 0
4 1412.26
5 -22.77
6 1563.44
7 1412
7 rows selected.
SQL>
我正在尝试从字符串中解析美元金额。
示例字符串:
- *SOC 1369.00 - NCS 1239.46 = PT LIAB 129.54
- *SOC 1369.00 - NCS 1239.46 = PT LIAB 140
- *SOC = 1178.00
- *SOC 1622.00 - NCS 209.74 = PT LIAB 1412.26 收件人年龄
- *第 1 行 SOC 0.00 - NCS 22.77 = LIAB -22.77
- SOC 满足并清除,SOC 2062-NCS 498.56=PT 责任 1563.44
- *SOC 1622.00 - NCS 209.74 = PT LIAB 1412 收件人年龄 1234
我想拉patient liability,就是dollar amount following text "LIAB," "PT LIAB," or "LIABLE." dollar amount可以是负数,可以有也可以没有一个小数。
我的解决方案:
REPLACE(REPLACE(REGEXP_SUBSTR(REMARKS,'LIAB+[LE]?+ (-?+\d+[.]?)+\d'),'LIAB ',''),'LIABLE ','')
这似乎有点笨拙,我认为有更简单的解决方案。任何指导将不胜感激!
我正在使用 Toad for Oracle 12.8。
你几乎可以通过 REGEXP_REPLACE()
和反向引用到达那里:
REGEXP_REPLACE(REMARKS,'.*(PT LIAB|LIAB|LIABLE) (-?\d+[.]?\d+).*', '')
... 但是它通过未触及的方式传递了一个没有匹配模式的值(所以你的第三个例子仍然会得到 *SOC = 1178.00
。你可以使用 case 表达式和 REGEXP_LIKE()
来避免这种情况:
with t (remarks) as (
select '*SOC 1369.00 - NCS 1239.46 = PT LIAB 129.54' from dual
union all select '*SOC 1369.00 - NCS 1239.46 = PT LIAB 140' from dual
union all select '*SOC = 1178.00' from dual
union all select '*SOC 1622.00 - NCS 209.74 = PT LIAB 1412.26 RECIPIENT AGE' from dual
union all select '*LINE #1 SOC 0.00 - NCS 22.77 = LIAB -22.77' from dual
union all select 'SOC MET AND CLEARED, SOC 2062-NCS 498.56=PT LIABLE 1563.44' from dual
)
SELECT REMARKS,
CASE WHEN REGEXP_LIKE(REMARKS, '.*(PT LIAB|LIAB|LIABLE) (-?\d+[.]?\d+).*')
THEN REGEXP_REPLACE(REMARKS,'.*(PT LIAB|LIAB|LIABLE) (-?\d+([.]\d+)?).*', '')
END as liability
from t;
REMARKS LIABILITY
---------------------------------------------------------- ----------
*SOC 1369.00 - NCS 1239.46 = PT LIAB 129.54 129.54
*SOC 1369.00 - NCS 1239.46 = PT LIAB 140 140
*SOC = 1178.00
*SOC 1622.00 - NCS 209.74 = PT LIAB 1412.26 RECIPIENT AGE 1412.26
*LINE #1 SOC 0.00 - NCS 22.77 = LIAB -22.77 -22.77
SOC MET AND CLEARED, SOC 2062-NCS 498.56=PT LIABLE 1563.44 1563.44
但这似乎并没有好多少,并且使用两次正则表达式会使它变得更加昂贵。 (它可能还可以简化......)。您还可以使用 REGEXP_REPLACE()
而不是两个普通的 REPLACE()
调用:
REGEXP_REPLACE(
REGEXP_SUBSTR(REMARKS, '(PT LIAB|LIAB|LIABLE) (-?\d+([.]\d+)?)'),
'(PT LIAB|LIAB|LIABLE) ')
但这又让它变得更贵了。
试一试:
SQL> with tbl(rownbr, remarks) as (
select 1, '*SOC 1369.00 - NCS 1239.46 = PT LIAB 129.54' from dual union
select 2, '*SOC 1369.00 - NCS 1239.46 = PT LIAB 140' from dual union
select 3, '*SOC = 1178.00' from dual union
select 4, '*SOC 1622.00 - NCS 209.74 = PT LIAB 1412.26 RECIPIENT AGE' from dual union
select 5, '*LINE #1 SOC 0.00 - NCS 22.77 = LIAB -22.77' from dual union
select 6, 'SOC MET AND CLEARED, SOC 2062-NCS 498.56=PT LIABLE 1563.44' from dual union
select 7, '*SOC 1622.00 - NCS 209.74 = PT LIAB 1412 RECIPIENT AGE 1234' from dual
)
select rownbr,
case
when remarks = regexp_replace(remarks, '.*((LIAB|LIABLE) ([-.0-9]+)).*$', '') then
'0' -- regexp_replace returns the orig string if the pattern is not found.
else
regexp_replace(remarks, '.*((LIAB|LIABLE) ([-.0-9]+)).*$', '')
end patient_liability
from tbl;
ROWNBR PATIENT_LIABILITY
---------- -------------------------
1 129.54
2 140
3 0
4 1412.26
5 -22.77
6 1563.44
7 1412
7 rows selected.
SQL>