Return 所有正则表达式匹配为新行
Return all regex matches as new rows
我有一个reviews
table如下:
r_id
my_comment
1
Boxes with the TID 823 cannot exceed 40 kg
2
Parcel with the marking tid 63157 must not make the weight go over 31 k.g
3
Envelopes with TID 104124 and TID 92341 cant excel above 94.477kg
4
TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg
5
Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG
我正在尝试匹配 2 件事。 TID 和重量 (kg)。如您所见,有 3 件事要牢记
- 重量始终以公斤为单位,不区分大小写,有两种书写方式,
kg
和 k.g
,并且有两种书写方式 <weight> <kg or k.g>
<weight><kg or k.g>
(一种带有 space,一个没有space)
- TID 不区分大小写,可以用两种方式编写
TID<id>
或 TID <id>
(一种带有 space,一种不带 space。
- 有些评论有多个 TID 和权重。我假设 TID 的第一次出现与权重的第一次出现有关,而 TID 的第二次出现是针对权重的第二次出现。我最多只使用了 TID/weight 的 2 个实例,但我希望它能够动态地处理任意数量的实例。
所以如果评论只有 1 个权重和 1 个 TID,我可以提取 TID
和权重。但是,如果它有多个,我不会这样做。所以我想把多个分成不同的行。
这是我想要的输出
r_id
tid
weight
my_comment
1
823
40
Boxes with the TID 823 cannot exceed 40 kg
2
63157
31
Parcel with the marking tid 63157 must not make the weight go over 31 k.g
3
104124
94.477
Envelopes with TID 104124 and TID 92341 can't excel above 94.477kg
3
92341
Envelopes with TID 104124 and TID 92341 can't excel above 94.477kg
4
38204
45.4
TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg
4
8242602
92
TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg
5
94514
52
Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG
5
51
Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG
SQL 创建 table/dummy 数据:
CREATE TABLE reviews(
r_id number(3) NOT NULL,
my_comment VARCHAR(255) NOT NULL
);
INSERT INTO reviews (r_id, my_comment) VALUES (1, 'Boxes with the TID 823 cannot exceed 40 kg');
INSERT INTO reviews (r_id, my_comment) VALUES (2, 'Parcel with the marking tid 63157 must not make the weight go over 31 k.g');
INSERT INTO reviews (r_id, my_comment) VALUES (3, 'Envelopes with TID 104124 and TID 92341 cant excel above 94.477kg');
INSERT INTO reviews (r_id, my_comment) VALUES (4, 'TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg');
INSERT INTO reviews (r_id, my_comment) VALUES (5, 'Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG');
在我的尝试中,我能够提取 tid 和权重,但只能提取第一个实例并且无法将其拆分成行。
SELECT
r_id,
REGEXP_SUBSTR (
REGEXP_SUBSTR (my_comment, '(tid).*?[0-9]+', 1, 1, 'i'),
'[0-9]+'
) as "tid",
REGEXP_SUBSTR (
REGEXP_SUBSTR (my_comment, '(cannot exceed|go over| excel above).*?[0-9]+ ?(kg|k.g)', 1, 1, 'i'),
'[0-9]+'
) as "weight"
FROM reviews;
I am able to extract the tid and weight, but only the first instance and not able to split it into rows.
您的查询,已修改:
- 我没有对你已经写的东西做太多,因为你似乎对提取的
tid
和 weight
很满意
- 我所做的更改是
regexp_substr
的 occurrence
参数(之前是 1
,现在是 column_value
)
- 为了获得 split 数据,添加了
cross join
,它“循环”通过 my_comment
的次数与 [=12] 之间的最大出现次数一样多=] 和 kg
(以任何形式)
- 例如,如果有2个
tid
和1个kg
,它会“循环”2次
- 它还用于避免仅使用
connect by level
子句时出现的重复
您确实将问题标记为 Oracle 10;我没有了,但我知道它不支持 regexp_count
功能。如果情况确实如此(您从未回答过 Koen 的问题),那么它将不起作用,您将不得不使用其他方式计算 tid
/weight
的出现次数。不过,我 希望 你没有使用 10g。
我运行这个代码在SQL*Plus。 BREAK
只是为了很好地区分 r_id
和 my_comment
值,没有任何其他目的。
SQL> break on r_id on my_comment
SQL> SELECT r_id,
2 my_comment,
3 REGEXP_SUBSTR (REGEXP_SUBSTR (my_comment,
4 '(tid).*?[0-9]+',
5 1,
6 COLUMN_VALUE,
7 'i'),
8 '[0-9]+') AS "tid",
9 REGEXP_SUBSTR (
10 REGEXP_SUBSTR (
11 my_comment,
12 '(cannot exceed|go over| excel above).*?[0-9]+ ?(kg|k.g)',
13 1,
14 COLUMN_VALUE,
15 'i'),
16 '[0-9]+') AS "weight"
17 FROM reviews
18 CROSS JOIN
19 TABLE (
20 CAST (
21 MULTISET (
22 SELECT LEVEL
23 FROM DUAL
24 CONNECT BY LEVEL <= GREATEST (REGEXP_COUNT (my_comment, 'tid' , 1, 'i'),
25 REGEXP_COUNT (my_comment, '(kg|k.g)', 1, 'i')))
26 AS SYS.odcinumberlist));
这导致
R_ID MY_COMMENT tid weight
----- ------------------------------------------------------------------------- ------- -------
1 Boxes with the TID 823 cannot exceed 40 kg 823 40
2 Parcel with the marking tid 63157 must not make the weight go over 31 k.g 63157 31
3 Envelopes with TID 104124 and TID 92341 cant excel above 94.477kg 104124 94
92341
4 TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg 38204 45
8242602 92
5 Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG 94514 52
51
8 rows selected.
SQL>
我有一个reviews
table如下:
r_id | my_comment |
---|---|
1 | Boxes with the TID 823 cannot exceed 40 kg |
2 | Parcel with the marking tid 63157 must not make the weight go over 31 k.g |
3 | Envelopes with TID 104124 and TID 92341 cant excel above 94.477kg |
4 | TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg |
5 | Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG |
我正在尝试匹配 2 件事。 TID 和重量 (kg)。如您所见,有 3 件事要牢记
- 重量始终以公斤为单位,不区分大小写,有两种书写方式,
kg
和k.g
,并且有两种书写方式<weight> <kg or k.g>
<weight><kg or k.g>
(一种带有 space,一个没有space) - TID 不区分大小写,可以用两种方式编写
TID<id>
或TID <id>
(一种带有 space,一种不带 space。 - 有些评论有多个 TID 和权重。我假设 TID 的第一次出现与权重的第一次出现有关,而 TID 的第二次出现是针对权重的第二次出现。我最多只使用了 TID/weight 的 2 个实例,但我希望它能够动态地处理任意数量的实例。
所以如果评论只有 1 个权重和 1 个 TID,我可以提取 TID
和权重。但是,如果它有多个,我不会这样做。所以我想把多个分成不同的行。
这是我想要的输出
r_id | tid | weight | my_comment |
---|---|---|---|
1 | 823 | 40 | Boxes with the TID 823 cannot exceed 40 kg |
2 | 63157 | 31 | Parcel with the marking tid 63157 must not make the weight go over 31 k.g |
3 | 104124 | 94.477 | Envelopes with TID 104124 and TID 92341 can't excel above 94.477kg |
3 | 92341 | Envelopes with TID 104124 and TID 92341 can't excel above 94.477kg | |
4 | 38204 | 45.4 | TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg |
4 | 8242602 | 92 | TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg |
5 | 94514 | 52 | Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG |
5 | 51 | Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG |
SQL 创建 table/dummy 数据:
CREATE TABLE reviews(
r_id number(3) NOT NULL,
my_comment VARCHAR(255) NOT NULL
);
INSERT INTO reviews (r_id, my_comment) VALUES (1, 'Boxes with the TID 823 cannot exceed 40 kg');
INSERT INTO reviews (r_id, my_comment) VALUES (2, 'Parcel with the marking tid 63157 must not make the weight go over 31 k.g');
INSERT INTO reviews (r_id, my_comment) VALUES (3, 'Envelopes with TID 104124 and TID 92341 cant excel above 94.477kg');
INSERT INTO reviews (r_id, my_comment) VALUES (4, 'TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg');
INSERT INTO reviews (r_id, my_comment) VALUES (5, 'Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG');
在我的尝试中,我能够提取 tid 和权重,但只能提取第一个实例并且无法将其拆分成行。
SELECT
r_id,
REGEXP_SUBSTR (
REGEXP_SUBSTR (my_comment, '(tid).*?[0-9]+', 1, 1, 'i'),
'[0-9]+'
) as "tid",
REGEXP_SUBSTR (
REGEXP_SUBSTR (my_comment, '(cannot exceed|go over| excel above).*?[0-9]+ ?(kg|k.g)', 1, 1, 'i'),
'[0-9]+'
) as "weight"
FROM reviews;
I am able to extract the tid and weight, but only the first instance and not able to split it into rows.
您的查询,已修改:
- 我没有对你已经写的东西做太多,因为你似乎对提取的
tid
和weight
很满意- 我所做的更改是
regexp_substr
的occurrence
参数(之前是1
,现在是column_value
)
- 我所做的更改是
- 为了获得 split 数据,添加了
cross join
,它“循环”通过my_comment
的次数与 [=12] 之间的最大出现次数一样多=] 和kg
(以任何形式)- 例如,如果有2个
tid
和1个kg
,它会“循环”2次 - 它还用于避免仅使用
connect by level
子句时出现的重复
- 例如,如果有2个
您确实将问题标记为 Oracle 10;我没有了,但我知道它不支持 regexp_count
功能。如果情况确实如此(您从未回答过 Koen 的问题),那么它将不起作用,您将不得不使用其他方式计算 tid
/weight
的出现次数。不过,我 希望 你没有使用 10g。
我运行这个代码在SQL*Plus。 BREAK
只是为了很好地区分 r_id
和 my_comment
值,没有任何其他目的。
SQL> break on r_id on my_comment
SQL> SELECT r_id,
2 my_comment,
3 REGEXP_SUBSTR (REGEXP_SUBSTR (my_comment,
4 '(tid).*?[0-9]+',
5 1,
6 COLUMN_VALUE,
7 'i'),
8 '[0-9]+') AS "tid",
9 REGEXP_SUBSTR (
10 REGEXP_SUBSTR (
11 my_comment,
12 '(cannot exceed|go over| excel above).*?[0-9]+ ?(kg|k.g)',
13 1,
14 COLUMN_VALUE,
15 'i'),
16 '[0-9]+') AS "weight"
17 FROM reviews
18 CROSS JOIN
19 TABLE (
20 CAST (
21 MULTISET (
22 SELECT LEVEL
23 FROM DUAL
24 CONNECT BY LEVEL <= GREATEST (REGEXP_COUNT (my_comment, 'tid' , 1, 'i'),
25 REGEXP_COUNT (my_comment, '(kg|k.g)', 1, 'i')))
26 AS SYS.odcinumberlist));
这导致
R_ID MY_COMMENT tid weight
----- ------------------------------------------------------------------------- ------- -------
1 Boxes with the TID 823 cannot exceed 40 kg 823 40
2 Parcel with the marking tid 63157 must not make the weight go over 31 k.g 63157 31
3 Envelopes with TID 104124 and TID 92341 cant excel above 94.477kg 104124 94
92341
4 TID38204 cannot go over 45.4 kg and TID 8242602 cannot go over 92kg 38204 45
8242602 92
5 Box with the TID 94514 cannot go over 52kg but also cannot go over 51KG 94514 52
51
8 rows selected.
SQL>