如何解释输出? Java ,Collat​​or-Collat​​ionKey-toByteArray()

How to interpret the output ? Java ,Collator-CollationKey-toByteArray()

我为字符串“a”创建了 Collat​​ionKey,然后使用方法 toByteArray() 将 Collat​​ionKey 转换为位序列。之后,我使用 Arrays.toString() 来显示这个 byte[] 数组,但我得到了一个我不理解的输出。我以为我会得到以位表示的字符串。 如何解释输出? 谢谢

package myPackage9;

import java.text.CollationKey;
import java.text.*;
import java.lang.*;
import java.util.Arrays;

public class collatorClass {
    public static void main(String[] args) {   
        Collator myCollator = Collator.getInstance();
        CollationKey[] a = new CollationKey[1];
        a[0] = myCollator.getCollationKey("a");
        byte[] bytes= a[0].toByteArray();
        System.out.println(Arrays.toString(bytes));
    }
}
output:  [0, 83, 0, 0, 0, 1, 0, 0, 0, 1]

CollationKey is an abstract class. Most likely your concrete type is a RuleBasedCollationKey. First, let's look at the JavaDoc 方法:

Converts the CollationKey to a sequence of bits. If two CollationKeys could be legitimately compared, then one could compare the byte arrays for each of those keys to obtain the same result. Byte arrays are organized most significant byte first.

显然,“a”的排序规则键与字符串“a”的字节不同,这并不奇怪

下一步是查看 its source 以准确了解它是什么 returning:

public byte[] toByteArray() {

    char[] src = key.toCharArray();
    byte[] dest = new byte[ 2*src.length ];
    int j = 0;
    for( int i=0; i<src.length; i++ ) {
        dest[j++] = (byte)(src[i] >>> 8);
        dest[j++] = (byte)(src[i] & 0x00ff);
    }
    return dest;
}

什么是key?它作为第二个构造函数参数传入。构造函数在 RuleBasedCollator#getCollationKey 中被调用。来源相当复杂,但该方法的 JavaDoc 指出:

Transforms the string into a series of characters that can be compared with CollationKey.compareTo. This overrides java.text.Collator.getCollationKey. It can be overriden in a subclass.

查看方法的内联代码注释,进一步说明:

// The basic algorithm here is to find all of the collation elements for each
// character in the source string, convert them to a char representation,
// and put them into the collation key.  But it's trickier than that.
// Each collation element in a string has three components: primary (A vs B),
// secondary (A vs A-acute), and tertiary (A' vs a); and a primary difference
// at the end of a string takes precedence over a secondary or tertiary
// difference earlier in the string.
//
// To account for this, we put all of the primary orders at the beginning of the
// string, followed by the secondary and tertiary orders, separated by nulls.

后面是一个假设的例子:

// Here's a hypothetical example, with the collation element represented as
// a three-digit number, one digit for primary, one for secondary, etc.
//
// String:              A     a     B   \u00e9 <--(e-acute)
// Collation Elements: 101   100   201  510
//
// Collation Key:      1125<null>0001<null>1010

因此,CollationKeytoByteArray() 方法 return 与 StringtoByteArray() 方法相同的假设是完全错误的。

"a".toByteArray()Collator.getInstance().getCollationKey("a").toByteArray() 不同。如果是的话,我们真的不需要排序规则键,对吗?