如何在 Java 中使用自定义字符集将字节数组编码为 Base 63 字符串?
How to encode byte array to Base 63 string with custom charset in Java?
我需要在 Java 中将字节数组编码为 Base63 字符串(注意,java.util.Base64
实现 Base64 而不是 Base63)。
我需要能够使用自定义字符集:
'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','_'
即类似于:
byte[] byteArray = new byte[]{0,1,2,3,4,5,-1,-2,-3};
String base63String = Base63.encode(byteArray);
byte[] decodedByteArray = Base63.decode(base63String);
assert java.util.Arrays.equals(byteArray, decodedByteArray);
Apache Commons Codec project offers Base16
, Base32
and Base64
编码器和解码器。
这些都继承自 BaseNCodec
class,它为编写任意基 encoder/decoder classes 提供了基础。不幸的是,实现自定义子class的文档不多,但现有实现的源代码可以提供指导。
正在搜索不同的实现并尝试将它们更改为 Base63
。我找到 Base58
实现 here。我能够稍微修改该代码以使用自定义基数(在我的情况下它是 Base-63)。
原作者非我本人,代码未详查,使用风险自负。
public class CustomBase {
private final char[] alphabet;
private final int divisor;
private final char encodedZero;
private final int[] indexes;
public CustomBase(char[] alphabet){
this.alphabet = alphabet;
this.divisor = alphabet.length;
this.encodedZero = alphabet[0];
this.indexes = new int[128];
Arrays.fill(indexes, -1);
for (int i = 0; i < alphabet.length; i++) {
char alphabetChar = alphabet[i];
if(alphabetChar >= indexes.length){
throw new IllegalArgumentException("All chars should be under 128 but there was a char "+alphabetChar);
}
if(indexes[alphabet[i]] != -1){
throw new IllegalArgumentException("Duplicate char found in the alphabet "+alphabetChar);
}
indexes[alphabet[i]] = i;
}
}
/**
* Encodes the given bytes as a baseX string (no checksum is appended).
*
* @param input the bytes to encode
* @return the baseX-encoded string
*/
public String encode(byte[] input) {
if (input.length == 0) {
return "";
}
// Count leading zeros.
int zeros = 0;
while (zeros < input.length && input[zeros] == 0) {
++zeros;
}
// Convert base-256 digits to base-X digits (plus conversion to ASCII characters)
input = Arrays.copyOf(input, input.length); // since we modify it in-place
char[] encoded = new char[input.length * 2]; // upper bound
int outputStart = encoded.length;
for (int inputStart = zeros; inputStart < input.length; ) {
encoded[--outputStart] = alphabet[divmod(input, inputStart, 256, divisor)];
if (input[inputStart] == 0) {
++inputStart; // optimization - skip leading zeros
}
}
// Preserve exactly as many leading encoded zeros in output as there were leading zeros in input.
while (outputStart < encoded.length && encoded[outputStart] == encodedZero) {
++outputStart;
}
while (--zeros >= 0) {
encoded[--outputStart] = encodedZero;
}
// Return encoded string (including encoded leading zeros).
return new String(encoded, outputStart, encoded.length - outputStart);
}
/**
* Decodes the given baseX string into the original data bytes.
*
* @param input the baseX-encoded string to decode
* @return the decoded data bytes
*/
public byte[] decode(String input) {
if (input.length() == 0) {
return new byte[0];
}
// Convert the baseX-encoded ASCII chars to a baseX byte sequence (baseX digits).
byte[] inputX = new byte[input.length()];
for (int i = 0; i < input.length(); ++i) {
char c = input.charAt(i);
int digit = c < 128 ? indexes[c] : -1;
if (digit < 0) {
throw new IllegalStateException("InvalidCharacter in base "+divisor);
}
inputX[i] = (byte) digit;
}
// Count leading zeros.
int zeros = 0;
while (zeros < inputX.length && inputX[zeros] == 0) {
++zeros;
}
// Convert base-X digits to base-256 digits.
byte[] decoded = new byte[input.length()];
int outputStart = decoded.length;
for (int inputStart = zeros; inputStart < inputX.length; ) {
decoded[--outputStart] = divmod(inputX, inputStart, divisor, 256);
if (inputX[inputStart] == 0) {
++inputStart; // optimization - skip leading zeros
}
}
// Ignore extra leading zeroes that were added during the calculation.
while (outputStart < decoded.length && decoded[outputStart] == 0) {
++outputStart;
}
// Return decoded data (including original number of leading zeros).
return Arrays.copyOfRange(decoded, outputStart - zeros, decoded.length);
}
/**
* Divides a number, represented as an array of bytes each containing a single digit
* in the specified base, by the given divisor. The given number is modified in-place
* to contain the quotient, and the return value is the remainder.
*
* @param number the number to divide
* @param firstDigit the index within the array of the first non-zero digit
* (this is used for optimization by skipping the leading zeros)
* @param base the base in which the number's digits are represented (up to 256)
* @param divisor the number to divide by (up to 256)
* @return the remainder of the division operation
*/
private static byte divmod(byte[] number, int firstDigit, int base, int divisor) {
// this is just long division which accounts for the base of the input digits
int remainder = 0;
for (int i = firstDigit; i < number.length; i++) {
int digit = (int) number[i] & 0xFF;
int temp = remainder * base + digit;
number[i] = (byte) (temp / divisor);
remainder = temp % divisor;
}
return (byte) remainder;
}
}
使用示例:
public static final char[] ALPHABET = new char[]{'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','_'};
private static final CustomBase BASE_63 = new CustomBase(ALPHABET);
public void testEncoding(){
byte[] byteArray = new byte[]{0,1,2,3,4,5,-1,-2,-3};
String base63String = BASE_63.encode(byteArray);
byte[] decodedByteArray = BASE_63.decode(base63String);
assert java.util.Arrays.equals(byteArray, decodedByteArray);
}
如果我们将上面的代码与 Base63 字母表一起使用,则与 Base58 原始代码相比,上面的代码在某些情况下显示出稍微更好的压缩。也就是说,在大多数情况下,它提供与 Base58 相同或相似的压缩。
比较:
[104, 101, 108, 108, 111, 87, 111, 114, 108, 100, 0, 0, 0, 0, 0, 0, 48, 57]
This byte array will be encoded in 25 characters when Base58 is used but in 24 characters when Base63 is used.
[-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -66, -5, -17, -66, -5, -17, -66, -5]
the above bytes will be encoded in 61 characters when we use Base58 and in 59 characters when we use Base63.
结论:上述 Base63 实现在某些情况下对 Base58 进行了小幅改进,而在许多其他情况下没有任何改进。可能,使用 CustomBase
实现的主要优点是能够使用您自己的字符集。
我需要在 Java 中将字节数组编码为 Base63 字符串(注意,java.util.Base64
实现 Base64 而不是 Base63)。
我需要能够使用自定义字符集:
'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','_'
即类似于:
byte[] byteArray = new byte[]{0,1,2,3,4,5,-1,-2,-3};
String base63String = Base63.encode(byteArray);
byte[] decodedByteArray = Base63.decode(base63String);
assert java.util.Arrays.equals(byteArray, decodedByteArray);
Apache Commons Codec project offers Base16
, Base32
and Base64
编码器和解码器。
这些都继承自 BaseNCodec
class,它为编写任意基 encoder/decoder classes 提供了基础。不幸的是,实现自定义子class的文档不多,但现有实现的源代码可以提供指导。
正在搜索不同的实现并尝试将它们更改为 Base63
。我找到 Base58
实现 here。我能够稍微修改该代码以使用自定义基数(在我的情况下它是 Base-63)。
原作者非我本人,代码未详查,使用风险自负。
public class CustomBase {
private final char[] alphabet;
private final int divisor;
private final char encodedZero;
private final int[] indexes;
public CustomBase(char[] alphabet){
this.alphabet = alphabet;
this.divisor = alphabet.length;
this.encodedZero = alphabet[0];
this.indexes = new int[128];
Arrays.fill(indexes, -1);
for (int i = 0; i < alphabet.length; i++) {
char alphabetChar = alphabet[i];
if(alphabetChar >= indexes.length){
throw new IllegalArgumentException("All chars should be under 128 but there was a char "+alphabetChar);
}
if(indexes[alphabet[i]] != -1){
throw new IllegalArgumentException("Duplicate char found in the alphabet "+alphabetChar);
}
indexes[alphabet[i]] = i;
}
}
/**
* Encodes the given bytes as a baseX string (no checksum is appended).
*
* @param input the bytes to encode
* @return the baseX-encoded string
*/
public String encode(byte[] input) {
if (input.length == 0) {
return "";
}
// Count leading zeros.
int zeros = 0;
while (zeros < input.length && input[zeros] == 0) {
++zeros;
}
// Convert base-256 digits to base-X digits (plus conversion to ASCII characters)
input = Arrays.copyOf(input, input.length); // since we modify it in-place
char[] encoded = new char[input.length * 2]; // upper bound
int outputStart = encoded.length;
for (int inputStart = zeros; inputStart < input.length; ) {
encoded[--outputStart] = alphabet[divmod(input, inputStart, 256, divisor)];
if (input[inputStart] == 0) {
++inputStart; // optimization - skip leading zeros
}
}
// Preserve exactly as many leading encoded zeros in output as there were leading zeros in input.
while (outputStart < encoded.length && encoded[outputStart] == encodedZero) {
++outputStart;
}
while (--zeros >= 0) {
encoded[--outputStart] = encodedZero;
}
// Return encoded string (including encoded leading zeros).
return new String(encoded, outputStart, encoded.length - outputStart);
}
/**
* Decodes the given baseX string into the original data bytes.
*
* @param input the baseX-encoded string to decode
* @return the decoded data bytes
*/
public byte[] decode(String input) {
if (input.length() == 0) {
return new byte[0];
}
// Convert the baseX-encoded ASCII chars to a baseX byte sequence (baseX digits).
byte[] inputX = new byte[input.length()];
for (int i = 0; i < input.length(); ++i) {
char c = input.charAt(i);
int digit = c < 128 ? indexes[c] : -1;
if (digit < 0) {
throw new IllegalStateException("InvalidCharacter in base "+divisor);
}
inputX[i] = (byte) digit;
}
// Count leading zeros.
int zeros = 0;
while (zeros < inputX.length && inputX[zeros] == 0) {
++zeros;
}
// Convert base-X digits to base-256 digits.
byte[] decoded = new byte[input.length()];
int outputStart = decoded.length;
for (int inputStart = zeros; inputStart < inputX.length; ) {
decoded[--outputStart] = divmod(inputX, inputStart, divisor, 256);
if (inputX[inputStart] == 0) {
++inputStart; // optimization - skip leading zeros
}
}
// Ignore extra leading zeroes that were added during the calculation.
while (outputStart < decoded.length && decoded[outputStart] == 0) {
++outputStart;
}
// Return decoded data (including original number of leading zeros).
return Arrays.copyOfRange(decoded, outputStart - zeros, decoded.length);
}
/**
* Divides a number, represented as an array of bytes each containing a single digit
* in the specified base, by the given divisor. The given number is modified in-place
* to contain the quotient, and the return value is the remainder.
*
* @param number the number to divide
* @param firstDigit the index within the array of the first non-zero digit
* (this is used for optimization by skipping the leading zeros)
* @param base the base in which the number's digits are represented (up to 256)
* @param divisor the number to divide by (up to 256)
* @return the remainder of the division operation
*/
private static byte divmod(byte[] number, int firstDigit, int base, int divisor) {
// this is just long division which accounts for the base of the input digits
int remainder = 0;
for (int i = firstDigit; i < number.length; i++) {
int digit = (int) number[i] & 0xFF;
int temp = remainder * base + digit;
number[i] = (byte) (temp / divisor);
remainder = temp % divisor;
}
return (byte) remainder;
}
}
使用示例:
public static final char[] ALPHABET = new char[]{'0','1','2','3','4','5','6','7','8','9','A','B','C','D','E','F','G','H','I','J','K','L','M','N','O','P','Q','R','S','T','U','V','W','X','Y','Z','a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p','q','r','s','t','u','v','w','x','y','z','_'};
private static final CustomBase BASE_63 = new CustomBase(ALPHABET);
public void testEncoding(){
byte[] byteArray = new byte[]{0,1,2,3,4,5,-1,-2,-3};
String base63String = BASE_63.encode(byteArray);
byte[] decodedByteArray = BASE_63.decode(base63String);
assert java.util.Arrays.equals(byteArray, decodedByteArray);
}
如果我们将上面的代码与 Base63 字母表一起使用,则与 Base58 原始代码相比,上面的代码在某些情况下显示出稍微更好的压缩。也就是说,在大多数情况下,它提供与 Base58 相同或相似的压缩。
比较:
[104, 101, 108, 108, 111, 87, 111, 114, 108, 100, 0, 0, 0, 0, 0, 0, 48, 57]
This byte array will be encoded in 25 characters when Base58 is used but in 24 characters when Base63 is used.
[-17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -17, -65, -67, -66, -5, -17, -66, -5, -17, -66, -5]
the above bytes will be encoded in 61 characters when we use Base58 and in 59 characters when we use Base63.
结论:上述 Base63 实现在某些情况下对 Base58 进行了小幅改进,而在许多其他情况下没有任何改进。可能,使用 CustomBase
实现的主要优点是能够使用您自己的字符集。