如果多个组匹配(多个异或),则正则表达式不 select

Reg Exp don't select if more than one group matches (multiple XOR)

数据保存在Oracle 12c数据库中,每个ICD-10-CM代码一行,患者ID(外键)像这样(注意可能还有很多其他代码,以下只是与这个问题相关的):

ID   ICD10CODE
 1   S72.91XB
 1   S72.92XB
 2   S72.211A
 3   S72.414A
 3   S72.415A
 4   S32.509A
 5   S32.301A
 5   S32.821A
 6   S32.421A
 6   S32.422A
 7   S32.421A
 8   S32.421A
 8   S32.509A

手头的任务是 select 不同的患者匹配 只有一个 以下几点(使用标准正则表达式语法):

每个患者都允许对项目符号中列出的代码进行任何排列或组合(包括重复),但跨行的排列或组合应该只针对患者进行。我的方法是在 GROUP BY ID:

上应用 LISTAGG
ID  LISTAGG(ICD10CODE, ',')
 1  S72.91XB,S72.92XB
 2  S72.211A
 3  S72.414A,S72.415A
 4  S32.509A
 5  S32.301A,S32.821A
 6  S32.421A,S32.422A
 7  S32.421A
 8  S32.421A,S32.509A

然后使用这个正则表达式进行过滤,(S32\.(([1-3]|[5-8])|(4\w((1|4)|(2|5)|(3)|([5-9]))))\w+)|(S72\.(([0-8]\w((1|4)|(2|5)|(3)|([5-9])))|(9((1|4)|(2|5)|(3)|([5-9]))))\w+),这几乎是上面项目符号的字面表示。我的表达是根据 中的想法改编的,其中似乎 ((RB\s+)+|(JJ\s+)+) 自动 select 是 "RB""JJ",但不是两者。

我无法让它工作。答案应该只包含 ID 2、4、5 和 7。但是,我开发的表达式匹配所有 ID。

这个问题的解决方案是什么?


[编辑]更多信息:

以上所有这些 S 代码都与下肢骨骼损伤有关:S32 是骨盆(髋骨)骨折,S72 是股骨(大腿)骨折骨)。请注意,我们有两个股骨和两个髋臼(股骨连接的骨盆窝)。 S32.4 代码表示髋臼(S32.[1235678]\w{3} 系列的其余部分表示骨盆的其他部分)。左右股骨和髋臼在第 6 个字符中分别用 1|42|5 表示,除非当这些数字出现在第 5 个字符中时代码以 S72.9 开头。

要纳入研究人群的患者应该只有一根骨头骨折。也就是说,两个股骨之一,一个髋臼或骨盆,但不是它们的组合。单个骨头的骨折组合无关紧要。比如右单股骨可以在10个不同的地方以不同的方式折断(膝盖区域、中轴、头部等,每个都生成不同的S72.\w[1|4]\w{2}代码),应该还是selected.

选项 1:

您可以使用一个正则表达式来完成:

SELECT t.id,
       t.icd10codes
FROM   ( SELECT id,
                LISTAGG(icd10code, ',') WITHIN GROUP (ORDER BY icd10code)
                  AS icd10codes
         FROM   table_name
         GROUP BY id
       ) t
WHERE  REGEXP_LIKE(
         t.icd10codes,
             '^(S32\.[1235678]\w\w\w(,|$))+$'
         || '|^(S32\.4\w[1346789]\w(,|$))+$'
         || '|^(S32\.4\w[2356789]\w(,|$))+$'
         || '|^(S72\.[0-8]\w[1346789]\w(,|$))+$'
         || '|^(S72\.[0-8]\w[2356789]\w(,|$))+$'
         || '|^(S72\.9[1346789]\w\w(,|$))+$'
         || '|^(S72\.9[2356789]\w\w(,|$))+$'
       )

其中,对于您的示例数据:

CREATE TABLE table_name (ID, ICD10CODE) AS
SELECT 1, 'S72.91XB' FROM DUAL UNION ALL
SELECT 1, 'S72.92XB' FROM DUAL UNION ALL
SELECT 2, 'S72.211A' FROM DUAL UNION ALL
SELECT 3, 'S72.414A' FROM DUAL UNION ALL
SELECT 3, 'S72.415A' FROM DUAL UNION ALL
SELECT 4, 'S32.509A' FROM DUAL UNION ALL
SELECT 5, 'S32.301A' FROM DUAL UNION ALL
SELECT 5, 'S32.821A' FROM DUAL UNION ALL
SELECT 6, 'S32.421A' FROM DUAL UNION ALL
SELECT 6, 'S32.422A' FROM DUAL UNION ALL
SELECT 7, 'S32.421A' FROM DUAL UNION ALL
SELECT 8, 'S32.421A' FROM DUAL UNION ALL
SELECT 8, 'S32.509A' FROM DUAL;

输出:

ID ICD10CODES
2 S72.211A
4 S32.509A
5 S32.301A,S32.821A
7 S32.421A

选项 2:

您可以将正则表达式放入 table:

CREATE TABLE matches (id, match) AS
SELECT 1, 'S32\.[1235678]\w\w\w'    FROM DUAL UNION ALL
SELECT 2, 'S32\.4\w[1346789]\w'     FROM DUAL UNION ALL
SELECT 3, 'S32\.4\w[2356789]\w'     FROM DUAL UNION ALL
SELECT 4, 'S72\.[0-8]\w[1346789]\w' FROM DUAL UNION ALL
SELECT 5, 'S72\.[0-8]\w[2356789]\w' FROM DUAL UNION ALL
SELECT 6, 'S72\.9[1346789]\w\w'     FROM DUAL UNION ALL
SELECT 7, 'S72\.9[2356789]\w\w'     FROM DUAL;

然后你可以使用查询:

SELECT t.id,
       m.id AS match_id,
       LISTAGG(t.icd10code, ',') WITHIN GROUP (ORDER BY t.icd10code)
         AS icd10codes
FROM   table_name t
       LEFT OUTER JOIN matches m
       PARTITION BY (m.id)
       ON (REGEXP_LIKE(t.icd10code, '^' || m.match || '$'))
GROUP BY
       t.id,
       m.id
HAVING
       COUNT(m.match) = COUNT(t.id);

选项 3:

与第一个选项类似,但您可以将匹配项放入 table 中,您可以确定使用了哪个匹配项:

SELECT t.id,
       m.id AS match_id,
       t.icd10codes
FROM   ( SELECT id,
                LISTAGG(icd10code, ',') WITHIN GROUP (ORDER BY icd10code)
                  AS icd10codes
         FROM   table_name
         GROUP BY id
       ) t
       INNER JOIN matches m
       ON (REGEXP_LIKE(t.icd10codes, '^(' || m.match || '(,|$))+$' ))

选项 2 和 3 都输出:

ID MATCH_ID ICD10CODES
4 1 S32.509A
5 1 S32.301A,S32.821A
7 2 S32.421A
2 4 S72.211A

选项 4:

如果将匹配项存储为:

,您还可以摆脱(慢速)正则表达式并使用 LIKE
CREATE TABLE matches (id, match) AS
SELECT 1, 'S32.1___' FROM DUAL UNION ALL
SELECT 1, 'S32.2___' FROM DUAL UNION ALL
SELECT 1, 'S32.3___' FROM DUAL UNION ALL
SELECT 1, 'S32.5___' FROM DUAL UNION ALL
SELECT 1, 'S32.6___' FROM DUAL UNION ALL
SELECT 1, 'S32.7___' FROM DUAL UNION ALL
SELECT 1, 'S32.8___' FROM DUAL UNION ALL
SELECT 2, 'S32.4_1_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_3_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_4_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_6_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_7_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_8_' FROM DUAL UNION ALL
SELECT 2, 'S32.4_9_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_2_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_3_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_5_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_6_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_7_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_8_' FROM DUAL UNION ALL
SELECT 3, 'S32.4_9_' FROM DUAL UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_1_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_3_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_4_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_6_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_7_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_8_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 4, 'S72.' || (LEVEL - 1) || '_9_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_2_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_3_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_5_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_6_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_7_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_8_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 5, 'S72.' || (LEVEL - 1) || '_9_' FROM DUAL CONNECT BY LEVEL <= 9 UNION ALL
SELECT 6, 'S72.91__' FROM DUAL UNION ALL
SELECT 6, 'S72.93__' FROM DUAL UNION ALL
SELECT 6, 'S72.94__' FROM DUAL UNION ALL
SELECT 6, 'S72.96__' FROM DUAL UNION ALL
SELECT 6, 'S72.97__' FROM DUAL UNION ALL
SELECT 6, 'S72.98__' FROM DUAL UNION ALL
SELECT 6, 'S72.99__' FROM DUAL UNION ALL
SELECT 7, 'S72.92__' FROM DUAL UNION ALL
SELECT 7, 'S72.93__' FROM DUAL UNION ALL
SELECT 7, 'S72.95__' FROM DUAL UNION ALL
SELECT 7, 'S72.96__' FROM DUAL UNION ALL
SELECT 7, 'S72.97__' FROM DUAL UNION ALL
SELECT 7, 'S72.98__' FROM DUAL UNION ALL
SELECT 7, 'S72.99__' FROM DUAL;

然后使用查询:

SELECT t.id,
       m.id AS match_id,
       LISTAGG(t.icd10code, ',') WITHIN GROUP (ORDER BY t.icd10code)
         AS icd10codes
FROM   table_name t
       LEFT OUTER JOIN matches m
       PARTITION BY (m.id)
       ON (t.icd10code LIKE m.match)
GROUP BY
       t.id,
       m.id
HAVING
       COUNT(m.match) = COUNT(t.id);

db<>fiddle here

好的,我已经把你的断骨代码加到S32和S72系列了
这就是真正需要完成的所有工作。

随意将 ,? 更改为 (,|$),但不要更改任何其他内容。
让我知道断骨代码是否正确。

  • S32\.\w[25]\w\w,S32.1\w\w\w,S32.2\w\w\w,S32.3\w\w\w,S32.5\w\w\w,S32.6\w\w\w , S32.7\w\w\w, S32.8\w\w\w
  • S32\.\w[25]\w\w,S32.4\w1\w,S32.4\w3\w,S32.4\w4\w,S32.4\w6\w,S32.4\w7\w , S32.4\w8\w, S32.4\w9\w
  • S32\.\w[25]\w\w,S32.4\w2\w,S32.4\w3\w,S32.4\w5\w,S32.4\w6\w,S32.4\w7\w , S32.4\w8\w, S32.4\w9\w
  • S72\.\w[14]\w\w,S72.[0-8]\w1\w,S72.[0-8]\w3\w,S72.[0-8]\w4\w, S72.[0-8]\w6\w, S72.[0-8]\w7\w, S72.[0-8]\w8\w, S72.[0-8]\w9\w
  • S72\.\w[14]\w\w,S72.[0-8]\w2\w,S72.[0-8]\w3\w,S72.[0-8]\w5\w, S72.[0-8]\w6\w, S72.[0-8]\w7\w, S72.[0-8]\w8\w, S72.[0-8]\w9\w
  • S72\.[14]\w\w\w,S72.91\w\w,S72.93\w\w,S72.94\w\w,S72.96\w\w,S72.97\w\w , S72.98\w\w, S72.99\w\w
  • S72\.[14]\w\w\w,S72.92\w\w,S72.93\w\w,S72.95\w\w,S72.96\w\w,S72.97\w\w , S72.98\w\w, S72.99\w\w

新的正则表达式是

^((S32\.(\w[25]|[1-35-8]\w)\w\w,?)+|(S32\.(\w[25]\w|4\w[1346-9])\w,?)+|(S32\.(\w[25]\w|4\w[235-9])\w,?)+|(S72\.(\w[14]\w|[0-8]\w[1346-9])\w,?)+|(S72\.(\w[14]\w|[0-8]\w[235-9])\w,?)+|(S72\.([14]\w|9[1346-9])\w\w,?)+|(S72\.([14]\w|9[235-9])\w\w,?)+)$

https://regex101.com/r/OAHdCO/1

 ^ 
 (
    ( S32 \. ( \w [25] | [1-35-8] \w ) \w\w ,? )+
  | ( S32 \. ( \w [25] \w | 4 \w [1346-9] ) \w ,? )+
  | ( S32 \. ( \w [25] \w | 4 \w [235-9] ) \w ,? )+
  | ( S72 \. ( \w [14] \w | [0-8] \w [1346-9] ) \w ,? )+
  | ( S72 \. ( \w [14] \w | [0-8] \w [235-9] ) \w ,? )+
  | ( S72 \. ( [14] \w | 9 [1346-9] ) \w\w ,? )+
  | ( S72 \. ( [14] \w | 9 [235-9] ) \w\w ,? )+
 )
 $