如何从 Oracle table 获取几乎匹配的字符串?
How to get almost matching string from Oracle table?
我在 Oracle 中有一个包含四列的 table。
现在用户可以将输入字符串作为 "operation Knee right"(有效)输入到我的查询中,我的查询应该 return ICD 代码 (IKR123) 与 DiagnosisName 列中的大部分单词相匹配。
以下是我当前的查询。(没有给出正确的输出)
SELECT diagnosisname
FROM
(SELECT diagnosisname,
UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname)
FROM icd_code
ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC
)
WHERE ROWNUM<2;
此查询给我的输出为 "Left Knee Operation",但我的期望是 "Right Knee Operation"。
请试试这个查询。这可能有助于解决您的问题。
SELECT diagnosisname
FROM (SELECT diagnosisname, UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname)
FROM icd_code
WHERE UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname) = 100
ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC)
WHERE ROWNUM<2
关于您对 UTL_MATCH 的使用,有几点需要注意:
- EDIT_DISTANCE_SIMILARITY : Returns 0 到 100 之间的整数,其中 0 表示完全不相似,100 表示完全匹配。
- JARO_WINKLER_SIMILARITY : Returns 0 到 100 之间的整数,其中 0 表示完全不相似,100 表示完全匹配但尝试取考虑到可能的数据输入错误。
ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC
这不会给你正确的结果。因为,您只考虑可能的相似性,而没有考虑数据输入错误。所以,你必须使用 JARO_WINKLER_SIMILARITY.
operation Knee right
您需要记住输入的 CASE 和要比较的列值。它们必须处于相似的情况下才能正确匹配。您在 LOWERCASE 中传递输入,但是,您的列值在 INITCAP 中。最好将列值和输入都转换为类似的情况。
让我们看看下面的演示来理解:
SQL> WITH DATA AS(
2 SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
3 SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
4 SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
5 SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
6 SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
7 )
8 SELECT t.*,
9 utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
10 UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
11 FROM DATA t
12 ORDER BY jws DESC
13 /
DIAGNOSIS_NAME ICD_CO EDS JWS
-------------------- ------ ---------- ----------
Right Knee Operation IKR123 20 72
Knee Operation IK123 20 70
Heart Operation IH123 25 68
Left Knee Operation IKL123 25 64
Fever IF123 15 47
SQL>
因此,您会发现两者之间有何不同。 jaro_winkler_similarity 在识别 数据输入错误 和给出 最接近的匹配 [=38= 方面做得更好].在此基础上,只需选择第一行并按降序排列即可:
SQL> WITH DATA AS(
2 SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
3 SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
4 SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
5 SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
6 SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
7 )
8 SELECT diagnosis_name
9 FROM
10 (SELECT t.*,
11 utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
12 UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
13 FROM DATA t
14 ORDER BY jws DESC
15 )
16 WHERE rownum = 1
17 /
DIAGNOSIS_NAME
--------------------
Right Knee Operation
SQL>
我在 Oracle 中有一个包含四列的 table。
现在用户可以将输入字符串作为 "operation Knee right"(有效)输入到我的查询中,我的查询应该 return ICD 代码 (IKR123) 与 DiagnosisName 列中的大部分单词相匹配。
以下是我当前的查询。(没有给出正确的输出)
SELECT diagnosisname
FROM
(SELECT diagnosisname,
UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname)
FROM icd_code
ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC
)
WHERE ROWNUM<2;
此查询给我的输出为 "Left Knee Operation",但我的期望是 "Right Knee Operation"。
请试试这个查询。这可能有助于解决您的问题。
SELECT diagnosisname
FROM (SELECT diagnosisname, UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname)
FROM icd_code
WHERE UTL_MATCH.jaro_winkler_similarity('%operation Knee right%',diagnosisname) = 100
ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC)
WHERE ROWNUM<2
关于您对 UTL_MATCH 的使用,有几点需要注意:
- EDIT_DISTANCE_SIMILARITY : Returns 0 到 100 之间的整数,其中 0 表示完全不相似,100 表示完全匹配。
- JARO_WINKLER_SIMILARITY : Returns 0 到 100 之间的整数,其中 0 表示完全不相似,100 表示完全匹配但尝试取考虑到可能的数据输入错误。
ORDER BY UTL_MATCH.EDIT_DISTANCE_SIMILARITY('%operation Knee right%',diagnosisname) DESC
这不会给你正确的结果。因为,您只考虑可能的相似性,而没有考虑数据输入错误。所以,你必须使用 JARO_WINKLER_SIMILARITY.
operation Knee right
您需要记住输入的 CASE 和要比较的列值。它们必须处于相似的情况下才能正确匹配。您在 LOWERCASE 中传递输入,但是,您的列值在 INITCAP 中。最好将列值和输入都转换为类似的情况。
让我们看看下面的演示来理解:
SQL> WITH DATA AS(
2 SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
3 SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
4 SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
5 SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
6 SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
7 )
8 SELECT t.*,
9 utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
10 UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
11 FROM DATA t
12 ORDER BY jws DESC
13 /
DIAGNOSIS_NAME ICD_CO EDS JWS
-------------------- ------ ---------- ----------
Right Knee Operation IKR123 20 72
Knee Operation IK123 20 70
Heart Operation IH123 25 68
Left Knee Operation IKL123 25 64
Fever IF123 15 47
SQL>
因此,您会发现两者之间有何不同。 jaro_winkler_similarity 在识别 数据输入错误 和给出 最接近的匹配 [=38= 方面做得更好].在此基础上,只需选择第一行并按降序排列即可:
SQL> WITH DATA AS(
2 SELECT 'Heart Operation' diagnosis_name, 'IH123' icd_code FROM dual UNION ALL
3 SELECT 'Knee Operation' diagnosis_name, 'IK123' icd_code FROM dual UNION ALL
4 SELECT 'Left Knee Operation' diagnosis_name, 'IKL123' icd_code FROM dual UNION ALL
5 SELECT 'Right Knee Operation' diagnosis_name, 'IKR123' icd_code FROM dual UNION ALL
6 SELECT 'Fever' diagnosis_name, 'IF123' icd_code FROM dual
7 )
8 SELECT diagnosis_name
9 FROM
10 (SELECT t.*,
11 utl_match.edit_distance_similarity(upper(diagnosis_name),upper('operation Knee right')) eds,
12 UTL_MATCH.jaro_winkler_similarity (upper(diagnosis_name),upper('operation Knee right')) jws
13 FROM DATA t
14 ORDER BY jws DESC
15 )
16 WHERE rownum = 1
17 /
DIAGNOSIS_NAME
--------------------
Right Knee Operation
SQL>