如何在 Java 中使用自定义字符集将字节数组编码为 Base 63 字符串?

How to encode byte array to Base 63 string with custom charset in Java?

我需要在 Java 中将字节数组编码为 Base63 字符串(注意,java.util.Base64 实现 Base64 而不是 Base63)。 我需要能够使用自定义字符集:

'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','_'

即类似于:

byte[] byteArray = new byte[]{0,1,2,3,4,5,-1,-2,-3};
String base63String = Base63.encode(byteArray);
byte[] decodedByteArray = Base63.decode(base63String);
assert java.util.Arrays.equals(byteArray, decodedByteArray);

Apache Commons Codec project offers Base16, Base32 and Base64 编码器和解码器。

这些都继承自 BaseNCodec class,它为编写任意基 encoder/decoder classes 提供了基础。不幸的是,实现自定义子class的文档不多,但现有实现的源代码可以提供指导。

正在搜索不同的实现并尝试将它们更改为 Base63。我找到 Base58 实现 here。我能够稍微修改该代码以使用自定义基数(在我的情况下它是 Base-63)。 原作者非我本人,代码未详查,使用风险自负。

public class CustomBase {
    private final char[] alphabet;
    private final int divisor;
    private final char encodedZero;
    private final int[] indexes;

    public CustomBase(char[] alphabet){
        this.alphabet = alphabet;
        this.divisor = alphabet.length;
        this.encodedZero = alphabet[0];
        this.indexes = new int[128];
        Arrays.fill(indexes, -1);
        for (int i = 0; i < alphabet.length; i++) {
            char alphabetChar = alphabet[i];
            if(alphabetChar >= indexes.length){
                throw new IllegalArgumentException("All chars should be under 128 but there was a char "+alphabetChar);
            }
            if(indexes[alphabet[i]] != -1){
                throw new IllegalArgumentException("Duplicate char found in the alphabet "+alphabetChar);
            }
            indexes[alphabet[i]] = i;
        }
    }

    /**
     * Encodes the given bytes as a baseX string (no checksum is appended).
     *
     * @param input the bytes to encode
     * @return the baseX-encoded string
     */
    public String encode(byte[] input) {
        if (input.length == 0) {
            return "";
        }
        // Count leading zeros.
        int zeros = 0;
        while (zeros < input.length && input[zeros] == 0) {
            ++zeros;
        }
        // Convert base-256 digits to base-X digits (plus conversion to ASCII characters)
        input = Arrays.copyOf(input, input.length); // since we modify it in-place
        char[] encoded = new char[input.length * 2]; // upper bound
        int outputStart = encoded.length;
        for (int inputStart = zeros; inputStart < input.length; ) {
            encoded[--outputStart] = alphabet[divmod(input, inputStart, 256, divisor)];
            if (input[inputStart] == 0) {
                ++inputStart; // optimization - skip leading zeros
            }
        }
        // Preserve exactly as many leading encoded zeros in output as there were leading zeros in input.
        while (outputStart < encoded.length && encoded[outputStart] == encodedZero) {
            ++outputStart;
        }
        while (--zeros >= 0) {
            encoded[--outputStart] = encodedZero;
        }
        // Return encoded string (including encoded leading zeros).
        return new String(encoded, outputStart, encoded.length - outputStart);
    }

    /**
     * Decodes the given baseX string into the original data bytes.
     *
     * @param input the baseX-encoded string to decode
     * @return the decoded data bytes
     */
    public byte[] decode(String input) {
        if (input.length() == 0) {
            return new byte[0];
        }
        // Convert the baseX-encoded ASCII chars to a baseX byte sequence (baseX digits).
        byte[] inputX = new byte[input.length()];
        for (int i = 0; i < input.length(); ++i) {
            char c = input.charAt(i);
            int digit = c < 128 ? indexes[c] : -1;
            if (digit < 0) {
                throw new IllegalStateException("InvalidCharacter in base "+divisor);
            }
            inputX[i] = (byte) digit;
        }
        // Count leading zeros.
        int zeros = 0;
        while (zeros < inputX.length && inputX[zeros] == 0) {
            ++zeros;
        }
        // Convert base-X digits to base-256 digits.
        byte[] decoded = new byte[input.length()];
        int outputStart = decoded.length;
        for (int inputStart = zeros; inputStart < inputX.length; ) {
            decoded[--outputStart] = divmod(inputX, inputStart, divisor, 256);
            if (inputX[inputStart] == 0) {
                ++inputStart; // optimization - skip leading zeros
            }
        }
        // Ignore extra leading zeroes that were added during the calculation.
        while (outputStart < decoded.length && decoded[outputStart] == 0) {
            ++outputStart;
        }
        // Return decoded data (including original number of leading zeros).
        return Arrays.copyOfRange(decoded, outputStart - zeros, decoded.length);
    }

    /**
     * Divides a number, represented as an array of bytes each containing a single digit
     * in the specified base, by the given divisor. The given number is modified in-place
     * to contain the quotient, and the return value is the remainder.
     *
     * @param number the number to divide
     * @param firstDigit the index within the array of the first non-zero digit
     *        (this is used for optimization by skipping the leading zeros)
     * @param base the base in which the number's digits are represented (up to 256)
     * @param divisor the number to divide by (up to 256)
     * @return the remainder of the division operation
     */
    private static byte divmod(byte[] number, int firstDigit, int base, int divisor) {
        // this is just long division which accounts for the base of the input digits
        int remainder = 0;
        for (int i = firstDigit; i < number.length; i++) {
            int digit = (int) number[i] & 0xFF;
            int temp = remainder * base + digit;
            number[i] = (byte) (temp / divisor);
            remainder = temp % divisor;
        }
        return (byte) remainder;
    }
}

使用示例:

public static final char[] ALPHABET = new char[]{'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','_'};
private static final CustomBase BASE_63 = new CustomBase(ALPHABET);

public void testEncoding(){
    byte[] byteArray = new byte[]{0,1,2,3,4,5,-1,-2,-3};
    String base63String = BASE_63.encode(byteArray);
    byte[] decodedByteArray = BASE_63.decode(base63String);
    assert java.util.Arrays.equals(byteArray, decodedByteArray);
}

如果我们将上面的代码与 Base63 字母表一起使用,则与 Base58 原始代码相比,上面的代码在某些情况下显示出稍微更好的压缩。也就是说,在大多数情况下,它提供与 Base58 相同或相似的压缩。

比较:

[104, 101, 108, 108, 111, 87, 111, 114, 108, 100, 0, 0, 0, 0, 0, 0, 48, 57]

This byte array will be encoded in 25 characters when Base58 is used but in 24 characters when Base63 is used. 

[-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -66, -5, -17, -66, -5, -17, -66, -5]

the above bytes will be encoded in 61 characters when we use Base58 and in 59 characters when we use Base63.

结论:上述 Base63 实现在某些情况下对 Base58 进行了小幅改进,而在许多其他情况下没有任何改进。可能,使用 CustomBase 实现的主要优点是能够使用您自己的字符集。