SQL 正则表达式 - 替换为另一个字段的子字符串
SQL Regex - Replace with substring from another field
我有一个问卷反馈的数据库table(Oracle 11g),包括选择题、多选题。 Options 列包含用户可以选择的每个值,Answers 列包含他们选择的数值。
ID_NO OPTIONS ANSWERS
1001 Apple Pie|Banana-Split|Cream Tea 1|2
1002 Apple Pie|Banana-Split|Cream Tea 2|3
1003 Apple Pie|Banana-Split|Cream Tea 1|2|3
我需要一个查询来解码答案,将答案的文本版本作为单个字符串。
ID_NO ANSWERS ANSWER_DECODE
1001 1|2 Apple Pie|Banana-Split
1002 2|3 Banana-Split|Cream Tea
1003 1|2|3 Apple Pie|Banana-Split|Cream Tea
我尝试过使用正则表达式来替换值和获取子字符串,但我无法想出正确合并两者的方法。
WITH feedback AS (
SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL )
SELECT
id_no,
options,
REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, 2) second_option,
answers,
REGEXP_REPLACE(answers, '(\d)+', ' ') answer_numbers,
REGEXP_REPLACE(answers, '(\d)+', REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, To_Number('2'))) "???"
FROM feedback
我不想手动定义或解码 SQL 中的答案;有许多调查有不同的问题(和不同数量的选项),所以我希望有一个解决方案可以动态地适用于所有这些问题。
我试图将选项和答案按 LEVEL 拆分成单独的行,并在代码匹配的地方重新加入它们,但这对于实际数据集(一个 5 选项问题,有 600 行响应)。
WITH feedback AS (
SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL )
SELECT
answer_rows.id_no,
ListAgg(option_rows.answer) WITHIN GROUP(ORDER BY option_rows.lvl)
FROM
(SELECT DISTINCT
LEVEL lvl,
REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, LEVEL) answer
FROM
(SELECT DISTINCT
options,
REGEXP_COUNT(options||'|', '(.)+?\|') num_choices
FROM
feedback)
CONNECT BY LEVEL <= num_choices
) option_rows
LEFT OUTER JOIN
(SELECT DISTINCT
id_no,
to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) answer
FROM
(SELECT DISTINCT
id_no,
answers,
To_Number(REGEXP_SUBSTR(answers, '(\d)+$')) max_answer
FROM
feedback)
WHERE
to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) IS NOT NULL
CONNECT BY LEVEL <= max_answer
) answer_rows
ON option_rows.lvl = answer_rows.answer
GROUP BY
answer_rows.id_no
ORDER BY
answer_rows.id_no
如果仅使用 Regex 没有解决方案,是否有比 LEVEL 更有效的方法来拆分值?或者还有其他可行的方法吗?
我已经在 MySQL 中编写了一个接近的解决方案(现在没有安装 Oracle)- 但我已经编写了需要更改的内容以便查询在 Oracle 中工作。
此外,我的代码中最丑陋的部分在 Oracle 中看起来会好得多,因为它具有更好的 INSTR 函数。
想法是用数字列表(1 到 10,以便每个调查最多支持 10 个选项)进行交叉连接,break-down OPTIONS 字段到不同的行...(您可以通过使用数字列表和 Oracle 的 INSTR 函数来执行此操作,请参阅评论)。
从那里过滤掉未选择的行并将所有内容重新组合在一起。
-- I've used GROUP_CONCAT in MySQL, but in Oracle you'll have to use WM_CONCAT
select ID_NO, ANSWERS, group_concat(broken_down_options,'|') `OPTIONS`
from (
select your_table.ID_NO, your_table.ANSWERS,
-- Luckily, you're using ORACLE so you can use an INSTR function that has the "occurrence" parameter
-- INSTR(string, substring, [position, [occurrence]])
-- use the nums.num field as input for the occurrence parameter
-- and just put '1' under "position"
case when nums.num = 1
then substr(your_table.`OPTIONS`, 1, instr(your_table.`OPTIONS`, '|') - 1)
when nums.num = 2
then substr(substr(your_table.`OPTIONS`, instr(your_table.`OPTIONS`, '|') + 1), 1, instr(substr(your_table.`OPTIONS`, instr(your_table.`OPTIONS`, '|') + 1), '|') - 1)
else substr(your_table.`OPTIONS`, length(your_table.`OPTIONS`) - instr(reverse(your_table.`OPTIONS`), '|') + 2) end broken_down_options
from (select 1 num union all
select 2 num union all
select 3 num union all
select 4 num union all
select 5 num union all
select 6 num union all
select 7 num union all
select 8 num union all
select 9 num union all
select 10 num
) nums
CROSS JOIN
(select 1001 ID_NO, 'Apple Pie|Banana-Split|Cream Tea' `OPTIONS`, '1|2' ANSWERS union
select 1002 ID_NO, 'Apple Pie|Banana-Split|Cream Tea' `OPTIONS`, '2|3' ANSWERS union
select 1003 ID_NO, 'Apple Pie|Banana-Split|Cream Tea' `OPTIONS`, '1|2|3' ANSWERS
) your_table
-- for example: 2|3 matches 2 and 3 but not 1
where your_table.ANSWERS like concat(concat('%',nums.num),'%')
) some_query
group by ID_NO, ANSWERS
创建存储的程序并执行以下步骤
- 声明一个你大小的数组。
- 从第一行获取
option
数据。使用正则表达式或 level
提取管道之间的值,然后将它们存储在数组中。注意:这将是一次性的。所以你不需要为每一行重复它。
- 现在在一个循环中,对于每一行,select
answers
并使用数组值来分配 answers
的值
速度很慢,因为您将每一行展开太多次;您正在使用的 connect-by 子句正在查看所有行,因此您最终会得到大量数据然后进行排序 - 这大概就是为什么您最终在其中得到 DISTINCT
的原因.
您可以将两个 PRIOR
子句添加到 connect-by,首先是为了保留 ID_NO
,其次是为了避免循环 - 任何 non-deterministic 函数都会为此,我选择了 dbms_random.value
,但如果您愿意,也可以使用 sys_guid
或其他。你也不需要很多子查询,你可以用两个来完成;或者作为 CTE,我认为它更清晰一些:
WITH feedback AS (
SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL
),
option_rows AS (
SELECT
id_no,
LEVEL answer,
REGEXP_SUBSTR(options, '[^|]+', 1, LEVEL) answer_text
FROM feedback
CONNECT BY LEVEL <= REGEXP_COUNT(options, '[^|]+')
AND id_no = PRIOR id_no
AND PRIOR dbms_random.value IS NOT NULL
),
answer_rows AS (
SELECT
id_no,
REGEXP_SUBSTR(answers, '[^|]+', 1, LEVEL) answer
FROM feedback
CONNECT BY LEVEL <= REGEXP_COUNT(answers, '[^|]+')
AND PRIOR id_no = id_no
AND PRIOR dbms_random.value IS NOT NULL
)
SELECT
option_rows.id_no,
LISTAGG(option_rows.answer, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answers,
LISTAGG(option_rows.answer_text, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answer_decode
FROM option_rows
JOIN answer_rows
ON option_rows.id_no = answer_rows.id_no
AND option_rows.answer = answer_rows.answer
GROUP BY option_rows.id_no
ORDER BY option_rows.id_no;
其中得到:
ID_NO ANSWERS ANSWER_DECODE
---------- ---------- ----------------------------------------
1001 1|2 Apple Pie|Banana-Split
1002 2|3 Banana-Split|Cream Tea
1003 1|2|3 Apple Pie|Banana-Split|Cream Tea
我还更改了您的正则表达式模式,因此您不必附加或删除 |
。
看看这个紧凑的解决方案:
with sample_data as
(
select 'ala|ma|kota' options, '1|2' answers from dual
union all
select 'apples|oranges|bacon', '1|2|3' from dual
union all
select 'a|b|c|d|e|f|h|i','1|3|4|5|8' from dual
)
select answers, options,
regexp_replace(regexp_replace(options,'([^|]+)\|([^|]+)\|([^|]+)','\' || replace(answers,'|','|\')),'[|]+','|') answer_decode
from sample_data;
输出:
ANSWERS OPTIONS ANSWER_DECODE
--------- -------------------- ---------------------------
1|2 ala|ma|kota ala|ma
1|2|3 apples|oranges|bacon apples|oranges|bacon
1|3|4|5|8 a|b|c|d|e|f|h|i a|c|d|f|h|i
我有一个问卷反馈的数据库table(Oracle 11g),包括选择题、多选题。 Options 列包含用户可以选择的每个值,Answers 列包含他们选择的数值。
ID_NO OPTIONS ANSWERS
1001 Apple Pie|Banana-Split|Cream Tea 1|2
1002 Apple Pie|Banana-Split|Cream Tea 2|3
1003 Apple Pie|Banana-Split|Cream Tea 1|2|3
我需要一个查询来解码答案,将答案的文本版本作为单个字符串。
ID_NO ANSWERS ANSWER_DECODE
1001 1|2 Apple Pie|Banana-Split
1002 2|3 Banana-Split|Cream Tea
1003 1|2|3 Apple Pie|Banana-Split|Cream Tea
我尝试过使用正则表达式来替换值和获取子字符串,但我无法想出正确合并两者的方法。
WITH feedback AS (
SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL )
SELECT
id_no,
options,
REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, 2) second_option,
answers,
REGEXP_REPLACE(answers, '(\d)+', ' ') answer_numbers,
REGEXP_REPLACE(answers, '(\d)+', REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, To_Number('2'))) "???"
FROM feedback
我不想手动定义或解码 SQL 中的答案;有许多调查有不同的问题(和不同数量的选项),所以我希望有一个解决方案可以动态地适用于所有这些问题。
我试图将选项和答案按 LEVEL 拆分成单独的行,并在代码匹配的地方重新加入它们,但这对于实际数据集(一个 5 选项问题,有 600 行响应)。
WITH feedback AS (
SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL )
SELECT
answer_rows.id_no,
ListAgg(option_rows.answer) WITHIN GROUP(ORDER BY option_rows.lvl)
FROM
(SELECT DISTINCT
LEVEL lvl,
REGEXP_SUBSTR(options||'|', '(.)+?\|', 1, LEVEL) answer
FROM
(SELECT DISTINCT
options,
REGEXP_COUNT(options||'|', '(.)+?\|') num_choices
FROM
feedback)
CONNECT BY LEVEL <= num_choices
) option_rows
LEFT OUTER JOIN
(SELECT DISTINCT
id_no,
to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) answer
FROM
(SELECT DISTINCT
id_no,
answers,
To_Number(REGEXP_SUBSTR(answers, '(\d)+$')) max_answer
FROM
feedback)
WHERE
to_number(REGEXP_SUBSTR(answers, '(\d)+', 1, LEVEL)) IS NOT NULL
CONNECT BY LEVEL <= max_answer
) answer_rows
ON option_rows.lvl = answer_rows.answer
GROUP BY
answer_rows.id_no
ORDER BY
answer_rows.id_no
如果仅使用 Regex 没有解决方案,是否有比 LEVEL 更有效的方法来拆分值?或者还有其他可行的方法吗?
我已经在 MySQL 中编写了一个接近的解决方案(现在没有安装 Oracle)- 但我已经编写了需要更改的内容以便查询在 Oracle 中工作。
此外,我的代码中最丑陋的部分在 Oracle 中看起来会好得多,因为它具有更好的 INSTR 函数。
想法是用数字列表(1 到 10,以便每个调查最多支持 10 个选项)进行交叉连接,break-down OPTIONS 字段到不同的行...(您可以通过使用数字列表和 Oracle 的 INSTR 函数来执行此操作,请参阅评论)。
从那里过滤掉未选择的行并将所有内容重新组合在一起。
-- I've used GROUP_CONCAT in MySQL, but in Oracle you'll have to use WM_CONCAT
select ID_NO, ANSWERS, group_concat(broken_down_options,'|') `OPTIONS`
from (
select your_table.ID_NO, your_table.ANSWERS,
-- Luckily, you're using ORACLE so you can use an INSTR function that has the "occurrence" parameter
-- INSTR(string, substring, [position, [occurrence]])
-- use the nums.num field as input for the occurrence parameter
-- and just put '1' under "position"
case when nums.num = 1
then substr(your_table.`OPTIONS`, 1, instr(your_table.`OPTIONS`, '|') - 1)
when nums.num = 2
then substr(substr(your_table.`OPTIONS`, instr(your_table.`OPTIONS`, '|') + 1), 1, instr(substr(your_table.`OPTIONS`, instr(your_table.`OPTIONS`, '|') + 1), '|') - 1)
else substr(your_table.`OPTIONS`, length(your_table.`OPTIONS`) - instr(reverse(your_table.`OPTIONS`), '|') + 2) end broken_down_options
from (select 1 num union all
select 2 num union all
select 3 num union all
select 4 num union all
select 5 num union all
select 6 num union all
select 7 num union all
select 8 num union all
select 9 num union all
select 10 num
) nums
CROSS JOIN
(select 1001 ID_NO, 'Apple Pie|Banana-Split|Cream Tea' `OPTIONS`, '1|2' ANSWERS union
select 1002 ID_NO, 'Apple Pie|Banana-Split|Cream Tea' `OPTIONS`, '2|3' ANSWERS union
select 1003 ID_NO, 'Apple Pie|Banana-Split|Cream Tea' `OPTIONS`, '1|2|3' ANSWERS
) your_table
-- for example: 2|3 matches 2 and 3 but not 1
where your_table.ANSWERS like concat(concat('%',nums.num),'%')
) some_query
group by ID_NO, ANSWERS
创建存储的程序并执行以下步骤
- 声明一个你大小的数组。
- 从第一行获取
option
数据。使用正则表达式或level
提取管道之间的值,然后将它们存储在数组中。注意:这将是一次性的。所以你不需要为每一行重复它。 - 现在在一个循环中,对于每一行,select
answers
并使用数组值来分配answers
的值
速度很慢,因为您将每一行展开太多次;您正在使用的 connect-by 子句正在查看所有行,因此您最终会得到大量数据然后进行排序 - 这大概就是为什么您最终在其中得到 DISTINCT
的原因.
您可以将两个 PRIOR
子句添加到 connect-by,首先是为了保留 ID_NO
,其次是为了避免循环 - 任何 non-deterministic 函数都会为此,我选择了 dbms_random.value
,但如果您愿意,也可以使用 sys_guid
或其他。你也不需要很多子查询,你可以用两个来完成;或者作为 CTE,我认为它更清晰一些:
WITH feedback AS (
SELECT 1001 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2' answers FROM DUAL UNION
SELECT 1002 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '2|3' answers FROM DUAL UNION
SELECT 1003 id_no, 'Apple Pie|Banana-Split|Cream Tea' options, '1|2|3' answers FROM DUAL
),
option_rows AS (
SELECT
id_no,
LEVEL answer,
REGEXP_SUBSTR(options, '[^|]+', 1, LEVEL) answer_text
FROM feedback
CONNECT BY LEVEL <= REGEXP_COUNT(options, '[^|]+')
AND id_no = PRIOR id_no
AND PRIOR dbms_random.value IS NOT NULL
),
answer_rows AS (
SELECT
id_no,
REGEXP_SUBSTR(answers, '[^|]+', 1, LEVEL) answer
FROM feedback
CONNECT BY LEVEL <= REGEXP_COUNT(answers, '[^|]+')
AND PRIOR id_no = id_no
AND PRIOR dbms_random.value IS NOT NULL
)
SELECT
option_rows.id_no,
LISTAGG(option_rows.answer, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answers,
LISTAGG(option_rows.answer_text, '|') WITHIN GROUP (ORDER BY option_rows.answer) AS answer_decode
FROM option_rows
JOIN answer_rows
ON option_rows.id_no = answer_rows.id_no
AND option_rows.answer = answer_rows.answer
GROUP BY option_rows.id_no
ORDER BY option_rows.id_no;
其中得到:
ID_NO ANSWERS ANSWER_DECODE
---------- ---------- ----------------------------------------
1001 1|2 Apple Pie|Banana-Split
1002 2|3 Banana-Split|Cream Tea
1003 1|2|3 Apple Pie|Banana-Split|Cream Tea
我还更改了您的正则表达式模式,因此您不必附加或删除 |
。
看看这个紧凑的解决方案:
with sample_data as
(
select 'ala|ma|kota' options, '1|2' answers from dual
union all
select 'apples|oranges|bacon', '1|2|3' from dual
union all
select 'a|b|c|d|e|f|h|i','1|3|4|5|8' from dual
)
select answers, options,
regexp_replace(regexp_replace(options,'([^|]+)\|([^|]+)\|([^|]+)','\' || replace(answers,'|','|\')),'[|]+','|') answer_decode
from sample_data;
输出:
ANSWERS OPTIONS ANSWER_DECODE
--------- -------------------- ---------------------------
1|2 ala|ma|kota ala|ma
1|2|3 apples|oranges|bacon apples|oranges|bacon
1|3|4|5|8 a|b|c|d|e|f|h|i a|c|d|f|h|i