Return 所有正则表达式匹配为新行

Return all regex matches as new rows

我有一个reviewstable如下:

r_id my_comment
1 Boxes with the TID 823 cannot exceed 40 kg
2 Parcel with the marking tid 63157 must not make the weight go over 31 k.g
3 Envelopes with TID 104124 and TID 92341 cant excel above 94.477kg
4 TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg
5 Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG

我正在尝试匹配 2 件事。 TID 和重量 (kg)。如您所见,有 3 件事要牢记

所以如果评论只有 1 个权重和 1 个 TID,我可以提取 TID 和权重。但是,如果它有多个,我不会这样做。所以我想把多个分成不同的行。

这是我想要的输出

r_id tid weight my_comment
1 823 40 Boxes with the TID 823 cannot exceed 40 kg
2 63157 31 Parcel with the marking tid 63157 must not make the weight go over 31 k.g
3 104124 94.477 Envelopes with TID 104124 and TID 92341 can't excel above 94.477kg
3 92341 Envelopes with TID 104124 and TID 92341 can't excel above 94.477kg
4 38204 45.4 TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg
4 8242602 92 TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg
5 94514 52 Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG
5 51 Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG

SQL 创建 table/dummy 数据:

CREATE TABLE reviews(
  r_id number(3) NOT NULL,
  my_comment VARCHAR(255) NOT NULL
);

INSERT INTO reviews (r_id, my_comment) VALUES (1, 'Boxes with the TID 823 cannot exceed 40 kg');
INSERT INTO reviews (r_id, my_comment) VALUES (2, 'Parcel with the marking tid 63157 must not make the weight go over 31 k.g');
INSERT INTO reviews (r_id, my_comment) VALUES (3, 'Envelopes with TID 104124 and TID 92341 cant excel above 94.477kg');
INSERT INTO reviews (r_id, my_comment) VALUES (4, 'TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg');
INSERT INTO reviews (r_id, my_comment) VALUES (5, 'Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG');

我的尝试中,我能够提取 tid 和权重,但只能提取第一个实例并且无法将其拆分成行。

SELECT
    r_id,

   REGEXP_SUBSTR (
        REGEXP_SUBSTR (my_comment, '(tid).*?[0-9]+', 1, 1, 'i'),
        '[0-9]+'
    ) as "tid",

    REGEXP_SUBSTR (
        REGEXP_SUBSTR (my_comment, '(cannot exceed|go over| excel above).*?[0-9]+ ?(kg|k.g)', 1, 1, 'i'),
        '[0-9]+'
    ) as "weight"

FROM reviews;

I am able to extract the tid and weight, but only the first instance and not able to split it into rows.

您的查询,已修改:

  • 我没有对你已经写的东西做太多,因为你似乎对提取的 tidweight 很满意
    • 我所做的更改是 regexp_substroccurrence 参数(之前是 1,现在是 column_value
  • 为了获得 split 数据,添加了 cross join,它“循环”通过 my_comment 的次数与 [=12] 之间的最大出现次数一样多=] 和 kg (以任何形式)
    • 例如,如果有2个tid和1个kg,它会“循环”2次
    • 它还用于避免仅使用 connect by level 子句时出现的重复

您确实将问题标记为 Oracle 10;我没有了,但我知道它不支持 regexp_count 功能。如果情况确实如此(您从未回答过 Koen 的问题),那么它将不起作用,您将不得不使用其他方式计算 tid/weight 的出现次数。不过,我 希望 你没有使用 10g。

我运行这个代码在SQL*Plus。 BREAK 只是为了很好地区分 r_idmy_comment 值,没有任何其他目的。

SQL> break on r_id on my_comment
SQL> SELECT r_id,
  2         my_comment,
  3         REGEXP_SUBSTR (REGEXP_SUBSTR (my_comment,
  4                                       '(tid).*?[0-9]+',
  5                                       1,
  6                                       COLUMN_VALUE,
  7                                       'i'),
  8                        '[0-9]+') AS "tid",
  9         REGEXP_SUBSTR (
 10            REGEXP_SUBSTR (
 11               my_comment,
 12               '(cannot exceed|go over| excel above).*?[0-9]+ ?(kg|k.g)',
 13               1,
 14               COLUMN_VALUE,
 15               'i'),
 16            '[0-9]+') AS "weight"
 17    FROM reviews
 18         CROSS JOIN
 19         TABLE (
 20            CAST (
 21               MULTISET (
 22                      SELECT LEVEL
 23                        FROM DUAL
 24                  CONNECT BY LEVEL <= GREATEST (REGEXP_COUNT (my_comment, 'tid'     , 1, 'i'),
 25                                                REGEXP_COUNT (my_comment, '(kg|k.g)', 1, 'i')))
 26                  AS SYS.odcinumberlist));

这导致

 R_ID MY_COMMENT                                                                tid     weight
----- ------------------------------------------------------------------------- ------- -------
    1 Boxes with the TID 823 cannot exceed 40 kg                                823     40
    2 Parcel with the marking tid 63157 must not make the weight go over 31 k.g 63157   31
    3 Envelopes with TID 104124 and TID 92341 cant excel above 94.477kg         104124  94
                                                                                92341
    4 TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg       38204   45
                                                                                8242602 92
    5 Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG   94514   52
                                                                                        51

8 rows selected.

SQL>