如何在 Postgres 上对挪威语文本进行排序
How to sort Norwegian text on Postgres
我在 Postgres 中对挪威语文本列进行排序时遇到问题。
我的环境:
db=# select version();
PostgreSQL 9.2.14 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
数据库托管在 RedHat Openshift 上。
当 运行 >locale
命令时,我得到:
- 区域设置:无法将 LC_CTYPE 设置为默认区域设置:没有这样的文件或目录
- 区域设置:无法将 LC_ALL 设置为默认区域设置:没有这样的文件或目录
- 语言=en_US.UTF-8
- LC_CTYPE=UTF-8
- LC_NUMERIC="en_US.UTF-8"
- LC_TIME="en_US.UTF-8"
- LC_COLLATE="en_US.UTF-8"
- LC_MONETARY="en_US.UTF-8"
- LC_MESSAGES="en_US.UTF-8"
- LC_PAPER="en_US.UTF-8"
- LC_NAME="en_US.UTF-8"
- LC_ADDRESS="en_US.UTF-8"
- LC_TELEPHONE="en_US.UTF-8"
- LC_MEASUREMENT="en_US.UTF-8"
- LC_IDENTIFICATION="en_US.UTF-8"
- LC_ALL=
**编辑
db=#\l
Name | Owner | Encoding | Collate | Ctype | Access privileges
-------------------------+--------------+----------+-------------+-------------+-----------------------
db | myadminUser | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
这是我试过的,
此 sql 显示默认排序不正确:
db=# select * from unnest(ARRAY['a','b','c','d','A','B','C','å','ø','z','Z','Ø']) as t1 order by t1;
结果:一个
一种
一种
b
乙
C
C
d
直径
Ø
z
Z
(我认为这种排序顺序对于英文来说甚至是错误的,大写 'A' 应该排在 'a' 之前,不是吗?)
然后我尝试了:
db=# CREATE COLLATION nor (LOCALE = 'nn_NO.utf8');
和之前相同的语句:
db=# select * from unnest(ARRAY['a','b','c','d','A','B','C','å','ø','z','Z','Ø']) as t1 order by t1 collate nor;
现在结果是:A
一种
乙
b
C
C
d
Z
z
Ø
直径
å
这看起来真的很不错,我以为我完成了..但后来我尝试了:
db=# select * from unnest(ARRAY['aaaa','bbbb','cccc','dddd','AAAA','BBBB','CCCC','åååå','øøøø','zzzz','ZZZZ','ØØØØ']) as t1 order by t1 collate nor;
结果:BBBB
bbbb
中国交建
cccc
dddd
ZZZZ
zzzz
ØØØØ
直径直径
AAAA级
啊啊啊
啊啊啊
我做错了什么?
顺序正确。在挪威语中,“aa”是“å”的拼写,应该放在最后。
来源:https://en.wikipedia.org/wiki/%C3%85
Correct alphabetization in Danish and Norwegian places Å as the last
letter in the alphabet, the sequence being Æ, Ø, Å. This is also true
for the alternative spelling "Aa". Unless manually corrected, sorting
algorithms of programs localised for Danish or Norwegian will place
e.g., Aaron after Zorro.
我在 Postgres 中对挪威语文本列进行排序时遇到问题。 我的环境:
db=# select version();
PostgreSQL 9.2.14 on x86_64-redhat-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4), 64-bit
数据库托管在 RedHat Openshift 上。
当 运行 >locale
命令时,我得到:
- 区域设置:无法将 LC_CTYPE 设置为默认区域设置:没有这样的文件或目录
- 区域设置:无法将 LC_ALL 设置为默认区域设置:没有这样的文件或目录
- 语言=en_US.UTF-8
- LC_CTYPE=UTF-8
- LC_NUMERIC="en_US.UTF-8"
- LC_TIME="en_US.UTF-8"
- LC_COLLATE="en_US.UTF-8"
- LC_MONETARY="en_US.UTF-8"
- LC_MESSAGES="en_US.UTF-8"
- LC_PAPER="en_US.UTF-8"
- LC_NAME="en_US.UTF-8"
- LC_ADDRESS="en_US.UTF-8"
- LC_TELEPHONE="en_US.UTF-8"
- LC_MEASUREMENT="en_US.UTF-8"
- LC_IDENTIFICATION="en_US.UTF-8"
- LC_ALL=
**编辑
db=#\l
Name | Owner | Encoding | Collate | Ctype | Access privileges
-------------------------+--------------+----------+-------------+-------------+-----------------------
db | myadminUser | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
这是我试过的, 此 sql 显示默认排序不正确:
db=# select * from unnest(ARRAY['a','b','c','d','A','B','C','å','ø','z','Z','Ø']) as t1 order by t1;
结果:一个 一种 一种 b 乙 C C d 直径 Ø z Z
(我认为这种排序顺序对于英文来说甚至是错误的,大写 'A' 应该排在 'a' 之前,不是吗?)
然后我尝试了:
db=# CREATE COLLATION nor (LOCALE = 'nn_NO.utf8');
和之前相同的语句:
db=# select * from unnest(ARRAY['a','b','c','d','A','B','C','å','ø','z','Z','Ø']) as t1 order by t1 collate nor;
现在结果是:A 一种 乙 b C C d Z z Ø 直径 å
这看起来真的很不错,我以为我完成了..但后来我尝试了:
db=# select * from unnest(ARRAY['aaaa','bbbb','cccc','dddd','AAAA','BBBB','CCCC','åååå','øøøø','zzzz','ZZZZ','ØØØØ']) as t1 order by t1 collate nor;
结果:BBBB bbbb 中国交建 cccc dddd ZZZZ zzzz ØØØØ 直径直径 AAAA级 啊啊啊 啊啊啊
我做错了什么?
顺序正确。在挪威语中,“aa”是“å”的拼写,应该放在最后。
来源:https://en.wikipedia.org/wiki/%C3%85
Correct alphabetization in Danish and Norwegian places Å as the last letter in the alphabet, the sequence being Æ, Ø, Å. This is also true for the alternative spelling "Aa". Unless manually corrected, sorting algorithms of programs localised for Danish or Norwegian will place e.g., Aaron after Zorro.