如何在 Java 中使用自定义字符集将字节数组编码为 Base 63 字符串？

Question

我需要在 Java 中将字节数组编码为 Base63 字符串（注意，java.util.Base64 实现 Base64 而不是 Base63）。我需要能够使用自定义字符集：

'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','_'

即类似于：

byte[] byteArray = new byte[]{0,1,2,3,4,5,-1,-2,-3};
String base63String = Base63.encode(byteArray);
byte[] decodedByteArray = Base63.decode(base63String);
assert java.util.Arrays.equals(byteArray, decodedByteArray);

Answer 1

Apache Commons Codec project offers Base16, Base32 and Base64 编码器和解码器。

这些都继承自 BaseNCodec class，它为编写任意基 encoder/decoder classes 提供了基础。不幸的是，实现自定义子class的文档不多，但现有实现的源代码可以提供指导。

Answer 2

正在搜索不同的实现并尝试将它们更改为 Base63。我找到 Base58 实现 here。我能够稍微修改该代码以使用自定义基数（在我的情况下它是 Base-63）。原作者非我本人，代码未详查，使用风险自负。

public class CustomBase {
    private final char[] alphabet;
    private final int divisor;
    private final char encodedZero;
    private final int[] indexes;

    public CustomBase(char[] alphabet){
        this.alphabet = alphabet;
        this.divisor = alphabet.length;
        this.encodedZero = alphabet[0];
        this.indexes = new int[128];
        Arrays.fill(indexes, -1);
        for (int i = 0; i < alphabet.length; i++) {
            char alphabetChar = alphabet[i];
            if(alphabetChar >= indexes.length){
                throw new IllegalArgumentException("All chars should be under 128 but there was a char "+alphabetChar);
            }
            if(indexes[alphabet[i]] != -1){
                throw new IllegalArgumentException("Duplicate char found in the alphabet "+alphabetChar);
            }
            indexes[alphabet[i]] = i;
        }
    }

    /**
     * Encodes the given bytes as a baseX string (no checksum is appended).
     *
     * @param input the bytes to encode
     * @return the baseX-encoded string
     */
    public String encode(byte[] input) {
        if (input.length == 0) {
            return "";
        }
        // Count leading zeros.
        int zeros = 0;
        while (zeros < input.length && input[zeros] == 0) {
            ++zeros;
        }
        // Convert base-256 digits to base-X digits (plus conversion to ASCII characters)
        input = Arrays.copyOf(input, input.length); // since we modify it in-place
        char[] encoded = new char[input.length * 2]; // upper bound
        int outputStart = encoded.length;
        for (int inputStart = zeros; inputStart < input.length; ) {
            encoded[--outputStart] = alphabet[divmod(input, inputStart, 256, divisor)];
            if (input[inputStart] == 0) {
                ++inputStart; // optimization - skip leading zeros
            }
        }
        // Preserve exactly as many leading encoded zeros in output as there were leading zeros in input.
        while (outputStart < encoded.length && encoded[outputStart] == encodedZero) {
            ++outputStart;
        }
        while (--zeros >= 0) {
            encoded[--outputStart] = encodedZero;
        }
        // Return encoded string (including encoded leading zeros).
        return new String(encoded, outputStart, encoded.length - outputStart);
    }

    /**
     * Decodes the given baseX string into the original data bytes.
     *
     * @param input the baseX-encoded string to decode
     * @return the decoded data bytes
     */
    public byte[] decode(String input) {
        if (input.length() == 0) {
            return new byte[0];
        }
        // Convert the baseX-encoded ASCII chars to a baseX byte sequence (baseX digits).
        byte[] inputX = new byte[input.length()];
        for (int i = 0; i < input.length(); ++i) {
            char c = input.charAt(i);
            int digit = c < 128 ? indexes[c] : -1;
            if (digit < 0) {
                throw new IllegalStateException("InvalidCharacter in base "+divisor);
            }
            inputX[i] = (byte) digit;
        }
        // Count leading zeros.
        int zeros = 0;
        while (zeros < inputX.length && inputX[zeros] == 0) {
            ++zeros;
        }
        // Convert base-X digits to base-256 digits.
        byte[] decoded = new byte[input.length()];
        int outputStart = decoded.length;
        for (int inputStart = zeros; inputStart < inputX.length; ) {
            decoded[--outputStart] = divmod(inputX, inputStart, divisor, 256);
            if (inputX[inputStart] == 0) {
                ++inputStart; // optimization - skip leading zeros
            }
        }
        // Ignore extra leading zeroes that were added during the calculation.
        while (outputStart < decoded.length && decoded[outputStart] == 0) {
            ++outputStart;
        }
        // Return decoded data (including original number of leading zeros).
        return Arrays.copyOfRange(decoded, outputStart - zeros, decoded.length);
    }

    /**
     * Divides a number, represented as an array of bytes each containing a single digit
     * in the specified base, by the given divisor. The given number is modified in-place
     * to contain the quotient, and the return value is the remainder.
     *
     * @param number the number to divide
     * @param firstDigit the index within the array of the first non-zero digit
     *        (this is used for optimization by skipping the leading zeros)
     * @param base the base in which the number's digits are represented (up to 256)
     * @param divisor the number to divide by (up to 256)
     * @return the remainder of the division operation
     */
    private static byte divmod(byte[] number, int firstDigit, int base, int divisor) {
        // this is just long division which accounts for the base of the input digits
        int remainder = 0;
        for (int i = firstDigit; i < number.length; i++) {
            int digit = (int) number[i] & 0xFF;
            int temp = remainder * base + digit;
            number[i] = (byte) (temp / divisor);
            remainder = temp % divisor;
        }
        return (byte) remainder;
    }
}

使用示例：

public static final char[] ALPHABET = new char[]{'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','_'};
private static final CustomBase BASE_63 = new CustomBase(ALPHABET);

public void testEncoding(){
    byte[] byteArray = new byte[]{0,1,2,3,4,5,-1,-2,-3};
    String base63String = BASE_63.encode(byteArray);
    byte[] decodedByteArray = BASE_63.decode(base63String);
    assert java.util.Arrays.equals(byteArray, decodedByteArray);
}

如果我们将上面的代码与 Base63 字母表一起使用，则与 Base58 原始代码相比，上面的代码在某些情况下显示出稍微更好的压缩。也就是说，在大多数情况下，它提供与 Base58 相同或相似的压缩。

比较：

[104, 101, 108, 108, 111, 87, 111, 114, 108, 100, 0, 0, 0, 0, 0, 0, 48, 57]

This byte array will be encoded in 25 characters when Base58 is used but in 24 characters when Base63 is used. 

[-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -66, -5, -17, -66, -5, -17, -66, -5]

the above bytes will be encoded in 61 characters when we use Base58 and in 59 characters when we use Base63.

结论：上述 Base63 实现在某些情况下对 Base58 进行了小幅改进，而在许多其他情况下没有任何改进。可能，使用 CustomBase 实现的主要优点是能够使用您自己的字符集。

如何在 Java 中使用自定义字符集将字节数组编码为 Base 63 字符串？

How to encode byte array to Base 63 string with custom charset in Java?

java

encoding

encode