如何将Android的AudioRecord创建的16位音频通过位移位转换为12位音频？

Question

我正在尝试将 16 位音频转换为 12 位音频。但是，我对此类转换缺乏经验，并且认为我的方法可能不正确或有缺陷。

作为下面代码片段的上下文，该用例是一个 Android 应用程序，用户可以对着它说话，音频会传输到 IoT 设备以便立即播放。 IoT 设备需要单声道 12 位、8k 采样率、小端字节序、无符号音频，数据存储在前十二位 (0-11) 和最后四位 (12-15) 中为零。音频数据需要以1000字节为单位接收。

正在 Android 应用程序中通过使用 AudioRecord 创建音频。其中实例化如下：

int bufferSize = 1000;
        this.audioRecord = new AudioRecord(
                MediaRecorder.AudioSource.MIC,
                8000,
                AudioFormat.CHANNEL_IN_MONO,
                AudioFormat.ENCODING_PCM_16BIT,
                bufferSize
        );

在 while 循环中，AudioRecord 被 1000 字节数据包读取并修改为用例中的规范。不确定这部分是否相关，但为了完整性：

byte[] buffer = new byte[1000];
            audioRecord.read(buffer, 0, buffer.length);
            byte[] modifiedBytes = convert16BitTo12Bit(buffer);

然后将修改后的字节发送到设备。

这里是修改字节的方法。基本上，为了符合规范，我将移动每个 16 位组中的位（抛出最低有效位 4）并在最后四个位置添加零。我通过 BitSet 做到这一点。

    /**
     * Takes a byte array presented as 16 bit audio and converts it to 12 bit audio through bit
     * manipulation. Packets must be of 1000 bytes or no manipulation will occur and the input
     * will be immediately returned.
     */
    private byte[] convert16BitTo12Bit(byte[] input) {
        if (input.length == 1000) {
            for (int i = 0; i < input.length; i += 2) {
                Log.d(TAG, "convert16BitTo12Bit: pass #" + (i / 2));
                byte[] chunk = new byte[2];
                System.arraycopy(input, i, chunk, 0, 2);
                if (!isEmptyByteArray(chunk)) {
                    byte[] modifiedBytes = convertChunk(chunk);
                    System.arraycopy(
                            modifiedBytes,
                            0,
                            input,
                            i,
                            modifiedBytes.length
                    );
                }
            }
            return input;
        }
        Log.d(TAG, "convert16BitTo12Bit: Failed - input is not 1000 in length; it is " + input.length);
        return input;
    }

    /**
     * Converts 2 bytes 16 bit audio into 12 bit audio. If the input is not 2 bytes, the input
     * will be returned without manipulation.
     */
    private byte[] convertChunk(byte[] chunk) {
        if (chunk.length == 2) {
            BitSet bitSet = BitSet.valueOf(chunk);
            Log.d(TAG, "convertChunk: bitSet starts as " + bitSet.toString());
            modifyBitSet(bitSet);
            Log.d(TAG, "convertChunk: bitSet ends as " + bitSet.toString());
            return bitSet.toByteArray();
        }
        Log.d(TAG, "convertChunk: Failed = chunk is not 2 in length; it is " + chunk.length);
        return chunk;
    }

    /**
     * Removes the first four bits and shifts the rest to leave the final four bits as 0.
     */
    private void modifyBitSet(BitSet bitSet) {
        for (int i = 4; i < bitSet.length(); i++) {
            bitSet.set(i - 4, bitSet.get(i));
        }
        if (bitSet.length() > 8) {
            bitSet.clear(12, 16);
        } else {
            bitSet.clear(4, 8);
        }
    }

    /**
     * Returns true if the byte array input contains all zero bits.
     */
    private boolean isEmptyByteArray(byte[] input) {
        BitSet bitSet = BitSet.valueOf(input);
        return bitSet.isEmpty();
    }

不幸的是，这种方法产生的结果不佳。音频非常嘈杂，很难听清某人在说什么（但您可以听到有人在说话）。

我也一直在尝试将字节保存到文件并通过 AudioTrack 在 Android 上播放。我注意到如果我只删除前四位而不移动任何内容，音频实际上听起来不错，因此：

        private void modifyBitSet(BitSet bitSet) {
        bitSet.clear(0, 4);
    }

然而，通过设备播放时，声音更糟，我什至认为我无法辨认出任何单词。

显然，我的方法在这里不起作用。核心问题是如何将 16 位块转换为 12 位音频并在最后四位必须为零的情况下保持音频质量？此外，考虑到我使用 AudioRecord 获取音频的更大方法，针对先前问题的这种解决方案是否适合此用例？

如果我可以提供更多信息来澄清这些问题和我的意图，请告诉我。

Answer 1

Given that the audio is 16 bits but must be changed to 12 with four zeros at the end, four bits somewhere do have to be tossed.

是的，当然没有别的办法，是吗？

这是我现在可以快速完成的事情。当然没有经过全面测试。仅使用 2 字节和 4 字节的输入进行测试。我会留给你测试它。

    //Reminder :: Convert as many as possible.
    //Reminder :: To calculate the required size for store: 
    //if((bytes.length & 1) == 0) Math.round((bytes.length * 6) / 8F) : Math.round(((bytes.length - 1) * 6) / 8F).
    //Return :: Amount of converted bytes.
    public static final int convert16BitTo12Bit(final byte[] bytes, final byte[] store) 
    {
        final int size = bytes.length;
        int storeIndex = 0;
        //Copy the first 2 bytes into store.
        store[storeIndex++] = bytes[0]; store[storeIndex] = bytes[1];
        if(size < 4) {
               store[storeIndex] = (byte)(store[storeIndex] & 0xF0);
               return 2;
                }
        final int result;
        final byte tmp;
        //  11111111 11110000 00000000 00000000
        //+              11111111 11110000      (<< 12)
        //= 11111111 11111111 11111111 00000000 (1)
        //-----------------------------------------
        //  11111111 00000000 00000000 00000000 (1)
        //+          11111111 11110000          (<< 16)
        //= 11111111 11111111 11110000 00000000 (2)
        //-----------------------------------------
        //  11110000 00000000 00000000 00000000 (2)
        //+     1111 11111111 0000              (<< 20)
        //= 11111111 11111111 00000000 00000000 (3)
        //-----------------------------------------
        //  00000000 00000000 00000000 00000000 (3)
        //+ 11111111 11110000                   (<< 24)
        //= 11111111 11110000 00000000 00000000
        for(int i=2, shiftBits = 12; i < size; i += 2) {
            if(shiftBits == 24) {
                //Copy 2 bytes from bytes[] into store[] and move on.
                store[storeIndex] = bytes[i];
                //Never store byte 0 (Garbage).
                tmp = (byte)(bytes[i + 1] & 0xF0); //Bit order: 11110000.
                if(tmp != 0) store[++storeIndex] = tmp;
                shiftBits = 12; //Reset
            } else if(shiftBits == 20) {
                result = ((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
                    | (((bytes[i] & 0xFF) << 20) | ((bytes[i + 1] & 0xFF) << 12));
                store[storeIndex] = (byte)((result >> 24) & 0xFF);
                tmp = (byte)((result >> 16) & 0xFF);
                //Never store byte 0 (Garbage).
                if(tmp != 0) store[++storeIndex] = tmp;
                shiftBits = 24;
            } else if(shiftBits == 16) {
                result = ((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
                    | (((bytes[i] & 0xFF) << 16) | ((bytes[i + 1] & 0xFF) << 8));
                store[storeIndex] = (byte)((result >> 16) & 0xFF);
                tmp = (byte)((result >> 8) & 0xF0);
                //Never store byte 0 (Garbage).
                if(tmp != 0) store[++storeIndex] = tmp;
                shiftBits = 20;
            } else {
                result = ((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
                    | (((bytes[i] & 0xFF) << 12) | ((bytes[i + 1] & 0xFF) << 4));
                store[storeIndex] = (byte)((result >> 16) & 0xFF);
                tmp = (byte)((result >> 8) & 0xFF);
                //Never store byte 0 (Garbage).
                if(tmp != 0) store[++storeIndex] = tmp;
                shiftBits = 16;
            }
        }
        return ++storeIndex;
    }

说明

result = ((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))
                    | (((bytes[i] & 0xFF) << 20) | ((bytes[i + 1] & 0xFF) << 12));

这基本上是将两个整数合并为一个。

((store[storeIndex - 1] << 24) | ((store[storeIndex] & 0xFF) << 16))

第一个是生成一个具有相同常量位位置的整数。

(((bytes[i] & 0xFF) << 20) | ((bytes[i + 1] & 0xFF) << 12));

后者用于具有不同位位置的 2 个当前字节。

(...) | (...)

中间的竖线或竖线是将我们刚刚创建的这两个整数合并为一个。

用法

使用此方法非常简单。

    byte[] buffer = new byte[1000];
    byte[] store;
    if((buffer.length & 1) == 0) { //Even.
        store = new byte[Math.round((bytes.length * 6) / 8F)];
    } else { //Odd.
        store = new byte[Math.round(((bytes.length - 1) * 6) / 8F)]; 
    }
    audioRecord.read(buffer, 0, buffer.length);
    int convertedByteSize = convert16BitTo12Bit(buffer, store);
    System.out.println("size: " + convertedByteSize);

Answer 2

我发现了一种可以产生清晰音频的解决方案。首先，重新计算用例的要求很重要，它是 12 位无符号单声道音频，设备将以 1000 字节的数据包以小端字节序读取。

问题中描述的AudioRecord的初始化和配置没问题。

从AudioRecord中读取1000字节的音频后，可以将其放入一个ByteBuffer中定义为little endian进行修改，然后放入一个ShortBuffer中进行16位级别的操作：

        // Audio specifications of device is in little endian.
        ByteBuffer byteBuffer = ByteBuffer.wrap(input).order(ByteOrder.LITTLE_ENDIAN);
        // Turn into a ShortBuffer so bitwise manipulation can occur on the 16 bit level.
        ShortBuffer shortBuffer = byteBuffer.asShortBuffer();

接下来，在一个循环中，取每个short并将其修改为12位无符号：

        for (int i = 0; i < shortBuffer.capacity(); i++) {
            short currentShort = shortBuffer.get(i);
            shortBuffer.put(i, convertShortTo12Bit(currentShort));
        }

这可以通过将16位向右移动四个空格将其变成12位有符号来实现。然后，要转换为无符号数，请添加 2048。为了我们的安全步骤，我们还根据设备的要求屏蔽了最低有效的四位，但考虑到移位和添加，实际上不应该保留任何位那里：

    private static short convertShortTo12Bit(short input) {
        int inputAsInt = input;
        inputAsInt >>>= 4;
        inputAsInt += 2048;
        input = (short) (inputAsInt & 0B0000111111111111);
        return input;
    }

如果希望 return 12 位到 16 位，对每个短做相反的操作（减去 2048 并向左移动四个空格）。

如何将Android的AudioRecord创建的16位音频通过位移位转换为12位音频？

How to convert 16-bit audio created with Android's AudioRecord to 12-bit audio through bit shifting?

java

audio

android

bit-manipulation

audiorecord