为什么 length(column) 和 lengthb(column) return 长度相同?
Why do length(column) and lengthb(column) return same length?
length(column)
和 lengthb(column)
在 Oracle 中返回相同的长度,即使是值中包含的多字节字符也是如此。通过复制粘贴检查 lengthb
时,多字节列值 returns 更大。
SELECT column1,
Length(column1) AS length_C,
Lengthb(column1) AS length_B,
Lengthb('100749 ¬ 100749 ¬ ') AS bytelength
FROM db.sample
+-------------------------------------------------------+
| column1 | length_C |length_B |bytelength |
+-------------------------------------------------------+
|100749 ¬ 100749 ¬ | 17 | 17 | 19 |
+-------------------------------------------------------+
知道两者属于同一系列 LENGTH 函数:
- 长度(字符数)
- 长度b(字节数)
- Lengthc(Unicode 字符,尽可能归一化)
我将向您展示一个示例,为此我将使用字符集为 AL32UTF8 的数据库。 UTF-8 是最流行的 Unicode 编码类型。它使用一个字节用于标准英文字母和符号,两个字节用于其他拉丁和中东字符,三个字节亚洲字符的字节数。额外的字符可以使用四个字节来表示。 UTF-8 向后兼容 ASCII,因为前 128 个字符映射到相同的值.
SQL> select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET'
2 ;
VALUE
--------------------------------------------------------------------------------
AL32UTF8
SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
2 select length(c1) , lengthb(c1) , lengthc ( c1 ) from t ;
LENGTH(C1) LENGTHB(C1) LENGTHC(C1)
---------- ----------- -----------
11 11 11
SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
2 select length(c2) , lengthb(c2), lengthc(c2) from t ;
LENGTH(C2) LENGTHB(C2) LENGTHC(C2)
---------- ----------- -----------
17 45 17
在例子中,C1只包含普通英文字母,所以三个函数return是一样的。在 c2 的情况下,您可能会看到字符、字节和 unicode 之间的区别。
在那些情况下,我总是建议使用 DUMP()。这是理解这些字符的内部表示的最好方法。
SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
2 select length(c1) as length_characters , dump(c1) as dump from t ;
LENGTH_CHARACTERS DUMP
----------------- -------------------------------------------------------
11 Typ=96 Len=11: 97,98,99,100,101,102,103,104,105,106,107
SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
2 select length(c2) as length_characters , dump(c2) as dump from t ;
LENGTH_CHARACTERS
-----------------
DUMP
--------------------------------------------------------------------------------
17
Typ=96 Len=45: 239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,1
91,189,239,191,189,239,191,189,35,43,35,239,191,189,239,191,189,239,191,189,239,
191,189,239,191,189,239,191,189
在你的情况下,你犯了一个错误,因为你使用了两次 lengthb
(我猜一个应该是 length )。检查字符串的内部表示:
SQL> select dump('100749 ¬ 100749 ¬ ',1016) from dual ;
DUMP('100749??100749??',1016)
------------------------------------------------------------------------------------------------------------------------
Typ=96 Len=28 CharacterSet=AL32UTF8: 31,30,30,37,34,39,20,ef,bf,bd,ef,bf,bd,20,31,30,30,37,34,39,20,ef,bf,bd,ef,bf,bd,20
SQL>
length(column)
和 lengthb(column)
在 Oracle 中返回相同的长度,即使是值中包含的多字节字符也是如此。通过复制粘贴检查 lengthb
时,多字节列值 returns 更大。
SELECT column1,
Length(column1) AS length_C,
Lengthb(column1) AS length_B,
Lengthb('100749 ¬ 100749 ¬ ') AS bytelength
FROM db.sample
+-------------------------------------------------------+
| column1 | length_C |length_B |bytelength |
+-------------------------------------------------------+
|100749 ¬ 100749 ¬ | 17 | 17 | 19 |
+-------------------------------------------------------+
知道两者属于同一系列 LENGTH 函数:
- 长度(字符数)
- 长度b(字节数)
- Lengthc(Unicode 字符,尽可能归一化)
我将向您展示一个示例,为此我将使用字符集为 AL32UTF8 的数据库。 UTF-8 是最流行的 Unicode 编码类型。它使用一个字节用于标准英文字母和符号,两个字节用于其他拉丁和中东字符,三个字节亚洲字符的字节数。额外的字符可以使用四个字节来表示。 UTF-8 向后兼容 ASCII,因为前 128 个字符映射到相同的值.
SQL> select value from nls_database_parameters where parameter = 'NLS_CHARACTERSET'
2 ;
VALUE
--------------------------------------------------------------------------------
AL32UTF8
SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
2 select length(c1) , lengthb(c1) , lengthc ( c1 ) from t ;
LENGTH(C1) LENGTHB(C1) LENGTHC(C1)
---------- ----------- -----------
11 11 11
SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
2 select length(c2) , lengthb(c2), lengthc(c2) from t ;
LENGTH(C2) LENGTHB(C2) LENGTHC(C2)
---------- ----------- -----------
17 45 17
在例子中,C1只包含普通英文字母,所以三个函数return是一样的。在 c2 的情况下,您可能会看到字符、字节和 unicode 之间的区别。
在那些情况下,我总是建议使用 DUMP()。这是理解这些字符的内部表示的最好方法。
SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
2 select length(c1) as length_characters , dump(c1) as dump from t ;
LENGTH_CHARACTERS DUMP
----------------- -------------------------------------------------------
11 Typ=96 Len=11: 97,98,99,100,101,102,103,104,105,106,107
SQL> with t as ( select 'abcdefghijk' as c1, 'üäöß#+#üöä' as c2 from dual )
2 select length(c2) as length_characters , dump(c2) as dump from t ;
LENGTH_CHARACTERS
-----------------
DUMP
--------------------------------------------------------------------------------
17
Typ=96 Len=45: 239,191,189,239,191,189,239,191,189,239,191,189,239,191,189,239,1
91,189,239,191,189,239,191,189,35,43,35,239,191,189,239,191,189,239,191,189,239,
191,189,239,191,189,239,191,189
在你的情况下,你犯了一个错误,因为你使用了两次 lengthb
(我猜一个应该是 length )。检查字符串的内部表示:
SQL> select dump('100749 ¬ 100749 ¬ ',1016) from dual ;
DUMP('100749??100749??',1016)
------------------------------------------------------------------------------------------------------------------------
Typ=96 Len=28 CharacterSet=AL32UTF8: 31,30,30,37,34,39,20,ef,bf,bd,ef,bf,bd,20,31,30,30,37,34,39,20,ef,bf,bd,ef,bf,bd,20
SQL>