在二维码中存储二进制数据

Storing binary data in QR codes

我正在尝试将二进制数据存储在二维码中。显然 QR 码确实支持存储原始二进制数据(或 ISO-8859-1 / Latin1)。这是我要编码的内容(十六进制):

d1 50 01 00 00 00 f6 5f 05 2d 8f 0b 40 e2 01

我试过以下编码器:

  1. qr.js

  1. Google Charts

charts

  1. qrcode.js

使用 zxing.org 解码会产生各种不正确的结果。两个 javascript 产生这个(这是错误的;第一个文本字符应该是 Ñ。

而 Google 图表生成此...

这是怎么回事?这些都是正确的吗?真正奇怪的是,如果我对这个序列(至少使用 JS 序列)进行编码,那么它就可以正常工作——我本以为问题是 non-ASCII 个字符,但 Ñ (0xd1) 是 non-ASCII.

d1 50 01 00 00 00 01 02 03 04 05 06 40 e2 01

有人知道这是怎么回事吗?

更新

我突然想到用我找到的 ZBar-based 扫描仪应用程序扫描它们。它可以扫描两个 JS 版本(至少它们以 ÑP 开头)。 Google 图表一是错误的。所以问题似乎出在 ZXing 上(令人惊讶的是,我不会向任何人推荐它)。

更新 2

ZBar 无法处理空字节。 :-(

一眼看去,二维码格式不一样。我会比较一下二维码格式,看看是纠错还是编码问题还是其他问题。

事实证明,ZXing 只是废话,而 ZBar 对数据做了一些奇怪的事情(例如将其转换为 UTF-8)。我设法让它输出包括空字节在内的原始数据。 Here is a patch 我找到的最好的 Android ZBar 库,现已合并。

我使用 System.Convert.ToBase64String 将提供的示例字节数组转换为 Base64 编码的字符串,然后我使用 ZXing 创建 QRCode 图像。

接下来调用 ZXing 从生成的 QRCode 中读取字符串,然后调用 System.Convert.FromBase64String 将字符串转换回字节数组。

我确认数据已成功完成往返。

"What is going on? Are any of these correct?"

除了 google 图表(它只是空的),您的二维码是正确的。

您可以看到来自 zxing 的二进制数据是您所期望的:

4: Byte mode indicator  
0f: length of 15 byte  
d15001...: your 15 bytes of data  
ec11 is just padding  

问题出在解码上。因为大多数解码器都会尝试将其解释为文本。但由于它是二进制数据,您不应尝试将其作为文本处理。即使您认为可以将它从文本转换为二进制,如您所见,这也可能导致值不是有效文本的问题。

所以解决方案是使用解码器输出二进制数据,而不是文本数据。

现在关于将 QR 码二进制数据解释为文本,您说第一个字符应该是 'Ñ' 如果将其解释为 "ISO-8859-1",则为真, 根据QR码标准,没有定义ECI模式时应该做的。

但实际上,在这种情况下,大多数智能手机二维码 reader 会将其解释为 UTF-8(或至少尝试自动检测编码)。

尽管这不是标准,但这已成为普遍做法: 没有 ECI 的二进制模式,UTF-8 编码文本。

也许背后的原因是没有人愿意浪费这些宝贵的字节添加指定UTF-8的ECI模式。事实上,并非所有解码器都支持 ECI。

在 QR 码中存储二进制数据需要克服两个问题。

  1. ISO-8859-1 不允许 00-1F 和 7F-9F 范围内的字节。如果你 无论如何都需要对这些字节进行编码,引用或编码它们,即使用 引用可打印或 Base-64 编码以避免这些范围。

  2. 由于您试图将二进制数据存储在 QR 码中,因此您必须 仅依靠您自己的扫描器来处理此二进制数据。你 不必通过其他软件显示二维码中的文本, 喜欢 zxing.org 的网络应用程序,因为大多数 QR 解码器, 包括 zxing.org 使用启发式方法检测字符 设置使用。这些启发式方法可能检测到的字符集不是 ISO-8859-1,因此无法正确显示您的二进制数据。一些 扫描仪使用试探法来检测字符集,即使 字符集由 ECI 明确给出。这就是提供 ECI 的原因 可能帮不上什么忙——即使使用 ECI,扫描仪仍然使用启发式算法。

因此,仅使用 US-ASCII 可打印字符(例如,在将二进制数据传递给二维码生成器之前以 Base64 编码的二进制数据)是针对启发式二维码的最安全选择。这也将克服另一个复杂问题:ISO-8859-1 不是 2000 年发布的早期 QR 码标准中的默认编码 (ISO/IEC 18004:2000)。该标准确实指定了符合 JIS X 0201(JIS8 也称为 ISO-2022-JP)的 8 位 Latin/Kana 字符集作为 8 位模式的默认编码,而 2005 年发布的更新标准确实改变了默认为 ISO-8859-1。

作为 Base-64 的替代方案,您可以使用两个十六进制字符(0-9,A-F)对每个字节进行编码,因此,在 QR 码中,您的数据将以字母数字模式编码,而不是 8-位模式。这肯定会禁用所有启发式算法,并且不应生成比 Base-64 更大的 QR 码,因为字母数字模式中的每个字符在 QR 码流中仅占用 6 位。

更新: 我最近返回并在 GitHub 上将参考代码作为项目发布给任何想要使用它的人。 https://github.com/yurelle/Base45Encoder


这有点死机,但我刚遇到这个问题,并找到了解决方案。

使用 ZXING 读取二维码的问题是它假定所有二维码有效载荷都是字符串。如果你愿意用 ZXING 生成 java 中的二维码,我开发了一个解决方案,可以在 ZXING 二维码中存储二进制有效负载,存储效率损失仅为 -8%;比 Base64 的 4x inflation 好得多。

它利用基于纯字母数字字符串的 ZXING 库的内部压缩优化。如果你想要完整的解释,包括数学和单元测试,请查看 .

但简短的回答是:

解决方案

我将其实现为独立的静态实用程序 class,因此您只需调用:

//Encode
final byte[] myBinaryData = ...;
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(myBinaryData);

//Decode
final byte[] decodedBytes = BinaryToBase45Encoder.decodeBase45QrPayload(encodedStr);

或者,您也可以通过 InputStreams 执行此操作:

//Encode
final InputStream in_1 = ... ;
final String encodedStr = BinaryToBase45Encoder.encodeToBase45QrPayload(in_1);

//Decode
final InputStream in_2 = ... ;
final byte[] decodedBytes = BinaryToBase45Encoder.decodeBase45QrPayload(in_2);

这是实现

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.IOException;
import java.io.InputStream;
import java.lang.reflect.Field;
import java.util.HashMap;
import java.util.LinkedList;
import java.util.Map;

/**
 * For some reason none of the Java QR Code libraries support binary payloads. At least, none that
 * I could find anyway. The commonly suggested workaround for this is to use Base64 encoding.
 * However, this results in a 4x payload size inflation. If your payload is already near the size
 * limit of QR codes, this is not possible.
 *
 * This class implements an encoder which takes advantage of a built-in compression optimization
 * of the ZXING QR Code library, to enable the storage of Binary data into a QR Code, with a
 * storage efficiency loss of only -8%.
 *
 * The built-in optimization is this: ZXING will automatically detect if your String payload is
 * purely AlphaNumeric (by their own definition), and if so, it will automatically compress 2
 * AlphaNumeric characters into 11 bits.
 *
 *
 * ----------------------
 *
 *
 * The included ALPHANUMERIC_TABLE is the conversion table used by the ZXING library as a reverse
 * index for determining if a given input data should be classified as alphanumeric.
 *
 * See:
 *
 *      com.google.zxing.qrcode.encoder.Encoder.chooseMode(String content, String encoding)
 *
 * which scans through the input string one character at a time and passes them to:
 *
 *      getAlphanumericCode(int code)
 *
 * in the same class, which uses that character as a numeric index into the the
 * ALPHANUMERIC_TABLE.
 *
 * If you examine the values, you'll notice that it ignores / disqualifies certain values, and
 * effectively converts the input into base 45 (0 -> 44; -1 is interpreted by the calling code
 * to mean a failure). This is confirmed in the function:
 *
 *      appendAlphanumericBytes(CharSequence content, BitArray bits)
 *
 * where they pack 2 of these base 45 digits into 11 bits. This presents us with an opportunity.
 * If we can take our data, and convert it into a compatible base 45 alphanumeric representation,
 * then the QR Encoder will automatically pack that data into sub-byte chunks.
 *
 * 2 digits in base 45 is 2,025 possible values. 11 bits has a maximum storage capacity of 2,048
 * possible states. This is only a loss of 1.1% in storage efficiency behind raw binary.
 *
 *      45 ^ 2 = 2,025
 *      2 ^ 11 = 2,048
 *      2,048 - 2,025 = 23
 *      23 / 2,048 = 0.01123046875 = 1.123%
 *
 * However, this is the ideal / theoretical efficiency. This implementation processes data in
 * chunks, using a Long as a computational buffer. However, since Java Long's are singed, we
 * can only use the lower 7 bytes. The conversion code requires continuously positive values;
 * using the highest 8th byte would contaminate the sign bit and randomly produce negative
 * values.
 *
 *
 * Real-World Test:
 *
 * Using a 7 byte Long to encode a 2KB buffer of random bytes, we get the following results.
 *
 *      Raw Binary Size:        2,048
 *      Encoded String Size:    3,218
 *      QR Code Alphanum Size:  2,213 (after the QR Code compresses 2 base45 digits to 11 bits)
 *
 * This is a real-world storage efficiency loss of only 8%.
 *
 *      2,213 - 2,048 = 165
 *      165 / 2,048 = 0.08056640625 = 8.0566%
 */
public class BinaryToBase45Encoder {
    public final static int[] ALPHANUMERIC_TABLE;

    /*
     * You could probably just copy & paste the array literal from the ZXING source code; it's only
     * an array definition. But I was unsure of the licensing issues with posting it on the internet,
     * so I did it this way.
     */
    static {
        final Field SOURCE_ALPHANUMERIC_TABLE;
        int[] tmp;

        //Copy lookup table from ZXING Encoder class
        try {
            SOURCE_ALPHANUMERIC_TABLE = com.google.zxing.qrcode.encoder.Encoder.class.getDeclaredField("ALPHANUMERIC_TABLE");
            SOURCE_ALPHANUMERIC_TABLE.setAccessible(true);
            tmp = (int[]) SOURCE_ALPHANUMERIC_TABLE.get(null);
        } catch (NoSuchFieldException e) {
            e.printStackTrace();//Shouldn't happen
            tmp = null;
        } catch (IllegalAccessException e) {
            e.printStackTrace();//Shouldn't happen
            tmp = null;
        }

        //Store
        ALPHANUMERIC_TABLE = tmp;
    }

    public static final int NUM_DISTINCT_ALPHANUM_VALUES = 45;
    public static final char[] alphaNumReverseIndex = new char[NUM_DISTINCT_ALPHANUM_VALUES];

    static {
        //Build AlphaNum Index
        final int len = ALPHANUMERIC_TABLE.length;
        for (int x = 0; x < len; x++) {
            // The base45 result which the alphanum lookup table produces.
            // i.e. the base45 digit value which String characters are
            // converted into.
            //
            // We use this value to build a reverse lookup table to find
            // the String character we have to send to the encoder, to
            // make it produce the given base45 digit value.
            final int base45DigitValue = ALPHANUMERIC_TABLE[x];

            //Ignore the -1 records
            if (base45DigitValue > -1) {
                //The index into the lookup table which produces the given base45 digit value.
                //
                //i.e. to produce a base45 digit with the numeric value in base45DigitValue, we need
                //to send the Encoder a String character with the numeric value in x.
                alphaNumReverseIndex[base45DigitValue] = (char) x;
            }
        }
    }

    /*
     * The storage capacity of one digit in the number system; i.e. the maximum
     * possible number of distinct values which can be stored in 1 logical digit
     */
    public static final int QR_PAYLOAD_NUMERIC_BASE = NUM_DISTINCT_ALPHANUM_VALUES;

    /*
     * We can't use all 8 bytes, because the Long is signed, and the conversion math
     * requires consistently positive values. If we populated all 8 bytes, then the
     * last byte has the potential to contaminate the sign bit, and break the
     * conversion math. So, we only use the lower 7 bytes, and avoid this problem.
     */
    public static final int LONG_USABLE_BYTES = Long.BYTES - 1;

    //The following mapping was determined by brute-forcing -1 Long (all bits 1), and compressing to base45 until it hit zero.
    public static final int[] BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION = new int[] {0,2,3,5,6,8,9,11,12};
    public static final int NUM_BASE45_DIGITS_PER_LONG = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[LONG_USABLE_BYTES];
    public static final Map<Integer, Integer> BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION = new HashMap<>();

    static {
        //Build Reverse Lookup
        int len = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION.length;
        for (int x=0; x<len; x++) {
            int numB45Digits = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[x];
            BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION.put(numB45Digits, x);
        }
    }

    public static String encodeToBase45QrPayload(final byte[] inputData) throws IOException {
        return encodeToBase45QrPayload(new ByteArrayInputStream(inputData));
    }

    public static String encodeToBase45QrPayload(final InputStream in) throws IOException {
        //Init conversion state vars
        final StringBuilder strOut = new StringBuilder();
        int data;
        long buf = 0;

        // Process all input data in chunks of size LONG.BYTES, this allows for economies of scale
        // so we can process more digits of arbitrary size before we hit the wall of the binary
        // chunk size in a power of 2, and have to transmit a sub-optimal chunk of the "crumbs"
        // left over; i.e. the slack space between where the multiples of QR_PAYLOAD_NUMERIC_BASE
        // and the powers of 2 don't quite line up.
        while(in.available() > 0) {
            //Fill buffer
            int numBytesStored = 0;
            while (numBytesStored < LONG_USABLE_BYTES && in.available() > 0) {
                //Read next byte
                data = in.read();

                //Push byte into buffer
                buf = (buf << 8) | data; //8 bits per byte

                //Increment
                numBytesStored++;
            }

            //Write out in lower base
            final StringBuilder outputChunkBuffer = new StringBuilder();
            final int numBase45Digits = BINARY_TO_BASE45_DIGIT_COUNT_CONVERSION[numBytesStored];
            int numB45DigitsProcessed = 0;
            while(numB45DigitsProcessed < numBase45Digits) {
                //Chunk out a digit
                final byte digit = (byte) (buf % QR_PAYLOAD_NUMERIC_BASE);

                //Drop digit data from buffer
                buf = buf / QR_PAYLOAD_NUMERIC_BASE;

                //Write Digit
                outputChunkBuffer.append(alphaNumReverseIndex[(int) digit]);

                //Track output digits
                numB45DigitsProcessed++;
            }

            /*
             * The way this code works, the processing output results in a First-In-Last-Out digit
             * reversal. So, we need to buffer the chunk output, and feed it to the OutputStream
             * backwards to correct this.
             *
             * We could probably get away with writing the bytes out in inverted order, and then
             * flipping them back on the decode side, but just to be safe, I'm always keeping
             * them in the proper order.
             */
            strOut.append(outputChunkBuffer.reverse().toString());
        }

        //Return
        return strOut.toString();
    }

    public static byte[] decodeBase45QrPayload(final String inputStr) throws IOException {
        //Prep for InputStream
        final byte[] buf = inputStr.getBytes();//Use the default encoding (the same encoding that the 'char' primitive uses)

        return decodeBase45QrPayload(new ByteArrayInputStream(buf));
    }

    public static byte[] decodeBase45QrPayload(final InputStream in) throws IOException {
        //Init conversion state vars
        final ByteArrayOutputStream out = new ByteArrayOutputStream();
        int data;
        long buf = 0;
        int x=0;

        // Process all input data in chunks of size LONG.BYTES, this allows for economies of scale
        // so we can process more digits of arbitrary size before we hit the wall of the binary
        // chunk size in a power of 2, and have to transmit a sub-optimal chunk of the "crumbs"
        // left over; i.e. the slack space between where the multiples of QR_PAYLOAD_NUMERIC_BASE
        // and the powers of 2 don't quite line up.
        while(in.available() > 0) {
            //Convert & Fill Buffer
            int numB45Digits = 0;
            while (numB45Digits < NUM_BASE45_DIGITS_PER_LONG && in.available() > 0) {
                //Read in next char
                char c = (char) in.read();

                //Translate back through lookup table
                int digit = ALPHANUMERIC_TABLE[(int) c];

                //Shift buffer up one digit to make room
                buf *= QR_PAYLOAD_NUMERIC_BASE;

                //Append next digit
                buf += digit;

                //Increment
                numB45Digits++;
            }

            //Write out in higher base
            final LinkedList<Byte> outputChunkBuffer = new LinkedList<>();
            final int numBytes = BASE45_TO_BINARY_DIGIT_COUNT_CONVERSION.get(numB45Digits);
            int numBytesProcessed = 0;
            while(numBytesProcessed < numBytes) {
                //Chunk out 1 byte
                final byte chunk = (byte) buf;

                //Shift buffer to next byte
                buf = buf >> 8; //8 bits per byte

                //Write byte to output
                //
                //Again, we need to invert the order of the bytes, so as we chunk them off, push
                //them onto a FILO stack; inverting their order.
                outputChunkBuffer.push(chunk);

                //Increment
                numBytesProcessed++;
            }

            //Write chunk buffer to output stream (in reverse order)
            while (outputChunkBuffer.size() > 0) {
                out.write(outputChunkBuffer.pop());
            }
        }

        //Return
        out.flush();
        out.close();
        return out.toByteArray();
    }
}