在二维码中存储二进制数据

Question

我正在尝试将二进制数据存储在二维码中。显然 QR 码确实支持存储原始二进制数据（或 ISO-8859-1 / Latin1）。这是我要编码的内容（十六进制）：

d1 50 01 00 00 00 f6 5f 05 2d 8f 0b 40 e2 01

我试过以下编码器：

qr.js

Google Charts

charts

qrcode.js

使用 zxing.org 解码会产生各种不正确的结果。两个 javascript 产生这个（这是错误的；第一个文本字符应该是 Ñ。

而 Google 图表生成此...

这是怎么回事？这些都是正确的吗？真正奇怪的是，如果我对这个序列（至少使用 JS 序列）进行编码，那么它就可以正常工作——我本以为问题是 non-ASCII 个字符，但 Ñ (0xd1) 是 non-ASCII.

d1 50 01 00 00 00 01 02 03 04 05 06 40 e2 01

有人知道这是怎么回事吗？

更新

我突然想到用我找到的 ZBar-based 扫描仪应用程序扫描它们。它可以扫描两个 JS 版本（至少它们以 ÑP 开头）。 Google 图表一是错误的。所以问题似乎出在 ZXing 上（令人惊讶的是，我不会向任何人推荐它）。

更新 2

ZBar 无法处理空字节。 :-(

Answer 1

一眼看去，二维码格式不一样。我会比较一下二维码格式，看看是纠错还是编码问题还是其他问题。

Answer 2

事实证明，ZXing 只是废话，而 ZBar 对数据做了一些奇怪的事情（例如将其转换为 UTF-8）。我设法让它输出包括空字节在内的原始数据。 Here is a patch 我找到的最好的 Android ZBar 库，现已合并。

Answer 3

我使用 System.Convert.ToBase64String 将提供的示例字节数组转换为 Base64 编码的字符串，然后我使用 ZXing 创建 QRCode 图像。

接下来调用 ZXing 从生成的 QRCode 中读取字符串，然后调用 System.Convert.FromBase64String 将字符串转换回字节数组。

我确认数据已成功完成往返。

Answer 4

"What is going on? Are any of these correct?"

除了 google 图表（它只是空的），您的二维码是正确的。

您可以看到来自 zxing 的二进制数据是您所期望的：

4: Byte mode indicator  
0f: length of 15 byte  
d15001...: your 15 bytes of data  
ec11 is just padding

问题出在解码上。因为大多数解码器都会尝试将其解释为文本。但由于它是二进制数据，您不应尝试将其作为文本处理。即使您认为可以将它从文本转换为二进制，如您所见，这也可能导致值不是有效文本的问题。

所以解决方案是使用解码器输出二进制数据，而不是文本数据。

现在关于将 QR 码二进制数据解释为文本，您说第一个字符应该是 'Ñ' 如果将其解释为 "ISO-8859-1"，则为真，根据QR码标准，没有定义ECI模式时应该做的。

但实际上，在这种情况下，大多数智能手机二维码 reader 会将其解释为 UTF-8（或至少尝试自动检测编码）。

尽管这不是标准，但这已成为普遍做法：没有 ECI 的二进制模式，UTF-8 编码文本。

也许背后的原因是没有人愿意浪费这些宝贵的字节添加指定UTF-8的ECI模式。事实上，并非所有解码器都支持 ECI。

Answer 5

在 QR 码中存储二进制数据需要克服两个问题。

ISO-8859-1 不允许 00-1F 和 7F-9F 范围内的字节。如果你无论如何都需要对这些字节进行编码，引用或编码它们，即使用引用可打印或 Base-64 编码以避免这些范围。
由于您试图将二进制数据存储在 QR 码中，因此您必须仅依靠您自己的扫描器来处理此二进制数据。你不必通过其他软件显示二维码中的文本，喜欢 zxing.org 的网络应用程序，因为大多数 QR 解码器，包括 zxing.org 使用启发式方法检测字符设置使用。这些启发式方法可能检测到的字符集不是 ISO-8859-1，因此无法正确显示您的二进制数据。一些扫描仪使用试探法来检测字符集，即使字符集由 ECI 明确给出。这就是提供 ECI 的原因可能帮不上什么忙——即使使用 ECI，扫描仪仍然使用启发式算法。

因此，仅使用 US-ASCII 可打印字符（例如，在将二进制数据传递给二维码生成器之前以 Base64 编码的二进制数据）是针对启发式二维码的最安全选择。这也将克服另一个复杂问题：ISO-8859-1 不是 2000 年发布的早期 QR 码标准中的默认编码 (ISO/IEC 18004:2000)。该标准确实指定了符合 JIS X 0201（JIS8 也称为 ISO-2022-JP）的 8 位 Latin/Kana 字符集作为 8 位模式的默认编码，而 2005 年发布的更新标准确实改变了默认为 ISO-8859-1。

作为 Base-64 的替代方案，您可以使用两个十六进制字符（0-9，A-F）对每个字节进行编码，因此，在 QR 码中，您的数据将以字母数字模式编码，而不是 8-位模式。这肯定会禁用所有启发式算法，并且不应生成比 Base-64 更大的 QR 码，因为字母数字模式中的每个字符在 QR 码流中仅占用 6 位。

Answer 6

更新： 我最近返回并在 GitHub 上将参考代码作为项目发布给任何想要使用它的人。 https://github.com/yurelle/Base45Encoder

这有点死机，但我刚遇到这个问题，并找到了解决方案。

使用 ZXING 读取二维码的问题是它假定所有二维码有效载荷都是字符串。如果你愿意用 ZXING 生成 java 中的二维码，我开发了一个解决方案，可以在 ZXING 二维码中存储二进制有效负载，存储效率损失仅为 -8%；比 Base64 的 4x inflation 好得多。

它利用基于纯字母数字字符串的 ZXING 库的内部压缩优化。如果你想要完整的解释，包括数学和单元测试，请查看 .

但简短的回答是：

解决方案

我将其实现为独立的静态实用程序 class，因此您只需调用：

//Encode
final byte[] myBinaryData = ...;
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(myBinaryData);

//Decode
final byte[] decodedBytes = BinaryToBase45Encoder.decodeBase45QrPayload(encodedStr);

或者，您也可以通过 InputStreams 执行此操作：

//Encode
final InputStream in_1 = ... ;
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(in_1);

//Decode
final InputStream in_2 = ... ;
final byte[] decodedBytes = BinaryToBase45Encoder.decodeBase45QrPayload(in_2);

这是实现

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.lang.reflect.Field;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.Map;

/**
 * For some reason none of the Java QR Code libraries support binary payloads. At least, none that
 * I could find anyway. The commonly suggested workaround for this is to use Base64 encoding.
 * However, this results in a 4x payload size inflation. If your payload is already near the size
 * limit of QR codes, this is not possible.
 *
 * This class implements an encoder which takes advantage of a built-in compression optimization
 * of the ZXING QR Code library, to enable the storage of Binary data into a QR Code, with a
 * storage efficiency loss of only -8%.
 *
 * The built-in optimization is this: ZXING will automatically detect if your String payload is
 * purely AlphaNumeric (by their own definition), and if so, it will automatically compress 2
 * AlphaNumeric characters into 11 bits.
 *
 *
 * ----------------------
 *
 *
 * The included ALPHANUMERIC_TABLE is the conversion table used by the ZXING library as a reverse
 * index for determining if a given input data should be classified as alphanumeric.
 *
 * See:
 *
 *      com.google.zxing.qrcode.encoder.Encoder.chooseMode(String content, String encoding)
 *
 * which scans through the input string one character at a time and passes them to:
 *
 *      getAlphanumericCode(int code)
 *
 * in the same class, which uses that character as a numeric index into the the
 * ALPHANUMERIC_TABLE.
 *
 * If you examine the values, you'll notice that it ignores / disqualifies certain values, and
 * effectively converts the input into base 45 (0 -> 44; -1 is interpreted by the calling code
 * to mean a failure). This is confirmed in the function:
 *
 *      appendAlphanumericBytes(CharSequence content, BitArray bits)
 *
 * where they pack 2 of these base 45 digits into 11 bits. This presents us with an opportunity.
 * If we can take our data, and convert it into a compatible base 45 alphanumeric representation,
 * then the QR Encoder will automatically pack that data into sub-byte chunks.
 *
 * 2 digits in base 45 is 2,025 possible values. 11 bits has a maximum storage capacity of 2,048
 * possible states. This is only a loss of 1.1% in storage efficiency behind raw binary.
 *
 *      45 ^ 2 = 2,025
 *      2 ^ 11 = 2,048
 *      2,048 - 2,025 = 23
 *      23 / 2,048 = 0.01123046875 = 1.123%
 *
 * However, this is the ideal / theoretical efficiency. This implementation processes data in
 * chunks, using a Long as a computational buffer. However, since Java Long's are singed, we
 * can only use the lower 7 bytes. The conversion code requires continuously positive values;
 * using the highest 8th byte would contaminate the sign bit and randomly produce negative
 * values.
 *
 *
 * Real-World Test:
 *
 * Using a 7 byte Long to encode a 2KB buffer of random bytes, we get the following results.
 *
 *      Raw Binary Size:        2,048
 *      Encoded String Size:    3,218
 *      QR Code Alphanum Size:  2,213 (after the QR Code compresses 2 base45 digits to 11 bits)
 *
 * This is a real-world storage efficiency loss of only 8%.
 *
 *      2,213 - 2,048 = 165
 *      165 / 2,048 = 0.08056640625 = 8.0566%
 */
public class BinaryToBase45Encoder {
    public final static int[] ALPHANUMERIC_TABLE;

    /*
     * You could probably just copy & paste the array literal from the ZXING source code; it's only
     * an array definition. But I was unsure of the licensing issues with posting it on the internet,
     * so I did it this way.
     */
    static {
        final Field SOURCE_ALPHANUMERIC_TABLE;
        int[] tmp;

        //Copy lookup table from ZXING Encoder class
        try {
            SOURCE_ALPHANUMERIC_TABLE = com.google.zxing.qrcode.encoder.Encoder.class.getDeclaredField("ALPHANUMERIC_TABLE");
            SOURCE_ALPHANUMERIC_TABLE.setAccessible(true);
            tmp = (int[]) SOURCE_ALPHANUMERIC_TABLE.get(null);
        } catch (NoSuchFieldException e) {
            e.printStackTrace();//Shouldn't happen
            tmp = null;
        } catch (IllegalAccessException e) {
            e.printStackTrace();//Shouldn't happen
            tmp = null;
        }

        //Store
        ALPHANUMERIC_TABLE = tmp;
    }

    public static final int NUM_DISTINCT_ALPHANUM_VALUES = 45;
    public static final char[] alphaNumReverseIndex = new char[NUM_DISTINCT_ALPHANUM_VALUES];

    static {
        //Build AlphaNum Index
        final int len = ALPHANUMERIC_TABLE.length;
        for (int x = 0; x < len; x++) {
            // The base45 result which the alphanum lookup table produces.
            // i.e. the base45 digit value which String characters are
            // converted into.
            //
            // We use this value to build a reverse lookup table to find
            // the String character we have to send to the encoder, to
            // make it produce the given base45 digit value.
            final int base45DigitValue = ALPHANUMERIC_TABLE[x];

            //Ignore the -1 records
            if (base45DigitValue > -1) {
                //The index into the lookup table which produces the given base45 digit value.
                //
                //i.e. to produce a base45 digit with the numeric value in base45DigitValue, we need
                //to send the Encoder a String character with the numeric value in x.
                alphaNumReverseIndex[base45DigitValue] = (char) x;
            }
        }
    }

    /*
     * The storage capacity of one digit in the number system; i.e. the maximum
     * possible number of distinct values which can be stored in 1 logical digit
     */
    public static final int QR_PAYLOAD_NUMERIC_BASE = NUM_DISTINCT_ALPHANUM_VALUES;

    /*
     * We can't use all 8 bytes, because the Long is signed, and the conversion math
     * requires consistently positive values. If we populated all 8 bytes, then the
     * last byte has the potential to contaminate the sign bit, and break the
     * conversion math. So, we only use the lower 7 bytes, and avoid this problem.
     */
    public static final int LONG_USABLE_BYTES = Long.BYTES - 1;

    //The following mapping was determined by brute-forcing -1 Long (all bits 1), and compressing to base45 until it hit zero.
    public static final int[] BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION = new int[] {0,2,3,5,6,8,9,11,12};
    public static final int NUM_BASE45_DIGITS_PER_LONG = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[LONG_USABLE_BYTES];
    public static final Map<Integer, Integer> BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION = new HashMap<>();

    static {
        //Build Reverse Lookup
        int len = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION.length;
        for (int x=0; x<len; x++) {
            int numB45Digits = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[x];
            BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION.put(numB45Digits, x);
        }
    }

    public static String encodeToBase45QrPayload(final byte[] inputData) throws IOException {
        return encodeToBase45QrPayload(new ByteArrayInputStream(inputData));
    }

    public static String encodeToBase45QrPayload(final InputStream in) throws IOException {
        //Init conversion state vars
        final StringBuilder strOut = new StringBuilder();
        int data;
        long buf = 0;

        // Process all input data in chunks of size LONG.BYTES, this allows for economies of scale
        // so we can process more digits of arbitrary size before we hit the wall of the binary
        // chunk size in a power of 2, and have to transmit a sub-optimal chunk of the "crumbs"
        // left over; i.e. the slack space between where the multiples of QR_PAYLOAD_NUMERIC_BASE
        // and the powers of 2 don't quite line up.
        while(in.available() > 0) {
            //Fill buffer
            int numBytesStored = 0;
            while (numBytesStored < LONG_USABLE_BYTES && in.available() > 0) {
                //Read next byte
                data = in.read();

                //Push byte into buffer
                buf = (buf << 8) | data; //8 bits per byte

                //Increment
                numBytesStored++;
            }

            //Write out in lower base
            final StringBuilder outputChunkBuffer = new StringBuilder();
            final int numBase45Digits = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[numBytesStored];
            int numB45DigitsProcessed = 0;
            while(numB45DigitsProcessed < numBase45Digits) {
                //Chunk out a digit
                final byte digit = (byte) (buf % QR_PAYLOAD_NUMERIC_BASE);

                //Drop digit data from buffer
                buf = buf / QR_PAYLOAD_NUMERIC_BASE;

                //Write Digit
                outputChunkBuffer.append(alphaNumReverseIndex[(int) digit]);

                //Track output digits
                numB45DigitsProcessed++;
            }

            /*
             * The way this code works, the processing output results in a First-In-Last-Out digit
             * reversal. So, we need to buffer the chunk output, and feed it to the OutputStream
             * backwards to correct this.
             *
             * We could probably get away with writing the bytes out in inverted order, and then
             * flipping them back on the decode side, but just to be safe, I'm always keeping
             * them in the proper order.
             */
            strOut.append(outputChunkBuffer.reverse().toString());
        }

        //Return
        return strOut.toString();
    }

    public static byte[] decodeBase45QrPayload(final String inputStr) throws IOException {
        //Prep for InputStream
        final byte[] buf = inputStr.getBytes();//Use the default encoding (the same encoding that the 'char' primitive uses)

        return decodeBase45QrPayload(new ByteArrayInputStream(buf));
    }

    public static byte[] decodeBase45QrPayload(final InputStream in) throws IOException {
        //Init conversion state vars
        final ByteArrayOutputStream out = new ByteArrayOutputStream();
        int data;
        long buf = 0;
        int x=0;

        // Process all input data in chunks of size LONG.BYTES, this allows for economies of scale
        // so we can process more digits of arbitrary size before we hit the wall of the binary
        // chunk size in a power of 2, and have to transmit a sub-optimal chunk of the "crumbs"
        // left over; i.e. the slack space between where the multiples of QR_PAYLOAD_NUMERIC_BASE
        // and the powers of 2 don't quite line up.
        while(in.available() > 0) {
            //Convert & Fill Buffer
            int numB45Digits = 0;
            while (numB45Digits < NUM_BASE45_DIGITS_PER_LONG && in.available() > 0) {
                //Read in next char
                char c = (char) in.read();

                //Translate back through lookup table
                int digit = ALPHANUMERIC_TABLE[(int) c];

                //Shift buffer up one digit to make room
                buf *= QR_PAYLOAD_NUMERIC_BASE;

                //Append next digit
                buf += digit;

                //Increment
                numB45Digits++;
            }

            //Write out in higher base
            final LinkedList<Byte> outputChunkBuffer = new LinkedList<>();
            final int numBytes = BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION.get(numB45Digits);
            int numBytesProcessed = 0;
            while(numBytesProcessed < numBytes) {
                //Chunk out 1 byte
                final byte chunk = (byte) buf;

                //Shift buffer to next byte
                buf = buf >> 8; //8 bits per byte

                //Write byte to output
                //
                //Again, we need to invert the order of the bytes, so as we chunk them off, push
                //them onto a FILO stack; inverting their order.
                outputChunkBuffer.push(chunk);

                //Increment
                numBytesProcessed++;
            }

            //Write chunk buffer to output stream (in reverse order)
            while (outputChunkBuffer.size() > 0) {
                out.write(outputChunkBuffer.pop());
            }
        }

        //Return
        out.flush();
        out.close();
        return out.toByteArray();
    }
}

在二维码中存储二进制数据

Storing binary data in QR codes

qr-code

zxing