如何使确切的 unicode 字符优先于 ASCII 版本?
How to make exact unicode characters take priority over ASCII versions?
我有一个数据库,其中包含德国城镇和城市的名称,例如慕尼黑和明斯特。
如果我这样查询:
SELECT name,
MATCH(name) AGAINST('+mün*' IN BOOLEAN MODE) AS relevance
FROM place_names
ORDER BY relevance DESC
对于包含 mun
、mün
或任何其他在不考虑变音符号的情况下扁平化为 mun
的文本,我得到相同的相关值。换句话说,搜索 mun
或 mün
会得到完全相同的结果。
我如何配置我的数据库,以便搜索 mün
会为实际包含字母 ü
的词提供更高的相关性,但仍将 u
视为匹配项?
CREATE TABLE place_names (id SERIAL PRIMARY KEY, name VARCHAR(255));
CREATE FULLTEXT INDEX idx ON place_names (name);
INSERT INTO place_names (name) VALUES ('Munich'), ('Münster');
SELECT * FROM place_names;
id
name
1
Munich
2
Münster
SELECT name,
MATCH(name) AGAINST('+mün*' IN BOOLEAN MODE) AS relevance
FROM place_names
ORDER BY relevance DESC;
name
relevance
Munich
0.000000001885928302414186
Münster
0.000000001885928302414186
ALTER TABLE place_names ADD COLUMN name2 VARCHAR(255) COLLATE utf8mb4_0900_bin AS (name) STORED;
CREATE FULLTEXT INDEX idx2 ON place_names (name2);
SELECT name,
MATCH(name) AGAINST('+mün*' IN BOOLEAN MODE) AS relevance,
MATCH(name2) AGAINST('+mün*' IN BOOLEAN MODE) AS relevance2
FROM place_names
ORDER BY relevance DESC;
name
relevance
relevance2
Munich
0.000000001885928302414186
0
Münster
0.000000001885928302414186
0.0906190574169159
db<>fiddle here
因此
SELECT name,
MATCH(name) AGAINST('+mün*' IN BOOLEAN MODE) AS relevance
FROM place_names
ORDER BY MATCH(name2) AGAINST('+mün*' IN BOOLEAN MODE) DESC;
一种方法可能是:
WHERE MATCH(name) AGAINST ('+mün*' IN BOOLEAN MODE) AS relevance
ORDER BY name LIKE '%Mün%' COLLATE utf8mb4_bin DESC, relevance DESC
另一件需要注意的事情是 MySQL 8.0 中存在排序规则 utf8mb4_0900_as_ci
——“区分重音和不区分大小写”。 (但是,那根本不匹配“Mun”。)
我有一个数据库,其中包含德国城镇和城市的名称,例如慕尼黑和明斯特。
如果我这样查询:
SELECT name,
MATCH(name) AGAINST('+mün*' IN BOOLEAN MODE) AS relevance
FROM place_names
ORDER BY relevance DESC
对于包含 mun
、mün
或任何其他在不考虑变音符号的情况下扁平化为 mun
的文本,我得到相同的相关值。换句话说,搜索 mun
或 mün
会得到完全相同的结果。
我如何配置我的数据库,以便搜索 mün
会为实际包含字母 ü
的词提供更高的相关性,但仍将 u
视为匹配项?
CREATE TABLE place_names (id SERIAL PRIMARY KEY, name VARCHAR(255)); CREATE FULLTEXT INDEX idx ON place_names (name); INSERT INTO place_names (name) VALUES ('Munich'), ('Münster'); SELECT * FROM place_names;
id name 1 Munich 2 Münster
SELECT name, MATCH(name) AGAINST('+mün*' IN BOOLEAN MODE) AS relevance FROM place_names ORDER BY relevance DESC;
name relevance Munich 0.000000001885928302414186 Münster 0.000000001885928302414186
ALTER TABLE place_names ADD COLUMN name2 VARCHAR(255) COLLATE utf8mb4_0900_bin AS (name) STORED; CREATE FULLTEXT INDEX idx2 ON place_names (name2);
SELECT name, MATCH(name) AGAINST('+mün*' IN BOOLEAN MODE) AS relevance, MATCH(name2) AGAINST('+mün*' IN BOOLEAN MODE) AS relevance2 FROM place_names ORDER BY relevance DESC;
name relevance relevance2 Munich 0.000000001885928302414186 0 Münster 0.000000001885928302414186 0.0906190574169159
db<>fiddle here
因此
SELECT name,
MATCH(name) AGAINST('+mün*' IN BOOLEAN MODE) AS relevance
FROM place_names
ORDER BY MATCH(name2) AGAINST('+mün*' IN BOOLEAN MODE) DESC;
一种方法可能是:
WHERE MATCH(name) AGAINST ('+mün*' IN BOOLEAN MODE) AS relevance
ORDER BY name LIKE '%Mün%' COLLATE utf8mb4_bin DESC, relevance DESC
另一件需要注意的事情是 MySQL 8.0 中存在排序规则 utf8mb4_0900_as_ci
——“区分重音和不区分大小写”。 (但是,那根本不匹配“Mun”。)