对 mp4 文件使用 mediacodec 和 mediamuxer 时出现音频和视频轨道同步问题
audio and video track synchronization issue when using mediacodec and mediamuxer for mp4 files
我想通过多路复用来自麦克风的音频(覆盖didGetAudioData)和来自相机的视频(覆盖onpreviewframe)来生成mp4文件。但是,我遇到了声音和视频同步问题,视频显示速度比音频快。我想知道问题是否与不兼容的配置或 presentationTimeUs 有关,有人可以指导我如何解决问题。下面是我的软件。
视频配置
formatVideo = MediaFormat.createVideoFormat(MIME_TYPE_VIDEO, 640, 360);
formatVideo.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420SemiPlanar);
formatVideo.setInteger(MediaFormat.KEY_BIT_RATE, 2000000);
formatVideo.setInteger(MediaFormat.KEY_FRAME_RATE, 30);
formatVideo.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 5);
得到如下视频演示PTS,
if(generateIndex == 0) {
videoAbsolutePtsUs = 132;
StartVideoAbsolutePtsUs = System.nanoTime() / 1000L;
}else {
CurrentVideoAbsolutePtsUs = System.nanoTime() / 1000L;
videoAbsolutePtsUs =132+ CurrentVideoAbsolutePtsUs-StartVideoAbsolutePtsUs;
}
generateIndex++;
音频配置
format = MediaFormat.createAudioFormat(MIME_TYPE, 48000/*sample rate*/, AudioFormat.CHANNEL_IN_MONO /*Channel config*/);
format.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
format.setInteger(MediaFormat.KEY_SAMPLE_RATE,48000);
format.setInteger(MediaFormat.KEY_CHANNEL_COUNT,1);
format.setInteger(MediaFormat.KEY_BIT_RATE,64000);
得到音频演示PTS如下,
if(generateIndex == 0) {
audioAbsolutePtsUs = 132;
StartAudioAbsolutePtsUs = System.nanoTime() / 1000L;
}else {
CurrentAudioAbsolutePtsUs = System.nanoTime() / 1000L;
audioAbsolutePtsUs =CurrentAudioAbsolutePtsUs - StartAudioAbsolutePtsUs;
}
generateIndex++;
audioAbsolutePtsUs = getJitterFreePTS(audioAbsolutePtsUs, audioInputLength / 2);
long startPTS = 0;
long totalSamplesNum = 0;
private long getJitterFreePTS(long bufferPts, long bufferSamplesNum) {
long correctedPts = 0;
long bufferDuration = (1000000 * bufferSamplesNum) / 48000;
bufferPts -= bufferDuration; // accounts for the delay of acquiring the audio buffer
if (totalSamplesNum == 0) {
// reset
startPTS = bufferPts;
totalSamplesNum = 0;
}
correctedPts = startPTS + (1000000 * totalSamplesNum) / 48000;
if(bufferPts - correctedPts >= 2*bufferDuration) {
// reset
startPTS = bufferPts;
totalSamplesNum = 0;
correctedPts = startPTS;
}
totalSamplesNum += bufferSamplesNum;
return correctedPts;
}
我的问题是因为仅对音频应用抖动功能引起的吗?如果是,我如何为视频应用抖动功能?我还试图通过 https://android.googlesource.com/platform/cts/+/jb-mr2-release/tests/tests/media/src/android/media/cts/EncodeDecodeTest.java 找到正确的音频和视频 presentationPTS。但是encodedecodeTest只提供了视频PTS。这就是我的实现对音频和视频使用系统纳秒的原因。如果想在encodedecodetest中使用video presentationPTS,如何构造兼容的audio presentationPTS?感谢帮助!
以下是我如何将 yuv 帧排队到视频媒体编解码器以供参考。对于音频部分,除了不同的presentationPTS外,其他都是一样的。
int videoInputBufferIndex;
int videoInputLength;
long videoAbsolutePtsUs;
long StartVideoAbsolutePtsUs, CurrentVideoAbsolutePtsUs;
int put_v =0;
int get_v =0;
int generateIndex = 0;
public void setByteBufferVideo(byte[] buffer, boolean isUsingFrontCamera, boolean Input_endOfStream){
if(Build.VERSION.SDK_INT >=18){
try{
endOfStream = Input_endOfStream;
if(!Input_endOfStream){
ByteBuffer[] inputBuffers = mVideoCodec.getInputBuffers();
videoInputBufferIndex = mVideoCodec.dequeueInputBuffer(-1);
if (VERBOSE) {
Log.w(TAG,"[put_v]:"+(put_v)+"; videoInputBufferIndex = "+videoInputBufferIndex+"; endOfStream = "+endOfStream);
}
if(videoInputBufferIndex>=0) {
ByteBuffer inputBuffer = inputBuffers[videoInputBufferIndex];
inputBuffer.clear();
inputBuffer.put(mNV21Convertor.convert(buffer));
videoInputLength = buffer.length;
if(generateIndex == 0) {
videoAbsolutePtsUs = 132;
StartVideoAbsolutePtsUs = System.nanoTime() / 1000L;
}else {
CurrentVideoAbsolutePtsUs = System.nanoTime() / 1000L;
videoAbsolutePtsUs =132+ CurrentVideoAbsolutePtsUs - StartVideoAbsolutePtsUs;
}
generateIndex++;
if (VERBOSE) {
Log.w(TAG, "[put_v]:"+(put_v)+"; videoAbsolutePtsUs = " + videoAbsolutePtsUs + "; CurrentVideoAbsolutePtsUs = "+CurrentVideoAbsolutePtsUs);
}
if (videoInputLength == AudioRecord.ERROR_INVALID_OPERATION) {
Log.w(TAG, "[put_v]ERROR_INVALID_OPERATION");
} else if (videoInputLength == AudioRecord.ERROR_BAD_VALUE) {
Log.w(TAG, "[put_v]ERROR_ERROR_BAD_VALUE");
}
if (endOfStream) {
Log.w(TAG, "[put_v]:"+(put_v++)+"; [get] receive endOfStream");
mVideoCodec.queueInputBuffer(videoInputBufferIndex, 0, videoInputLength, videoAbsolutePtsUs, MediaCodec.BUFFER_FLAG_END_OF_STREAM);
} else {
Log.w(TAG, "[put_v]:"+(put_v++)+"; receive videoInputLength :" + videoInputLength);
mVideoCodec.queueInputBuffer(videoInputBufferIndex, 0, videoInputLength, videoAbsolutePtsUs, 0);
}
}
}
}catch (Exception x) {
x.printStackTrace();
}
}
}
我在我的应用程序中解决这个问题的方法是针对共享 "sync clock" 设置所有视频和音频帧的 PTS(注意 sync 也意味着它是线程- safe) 在第一个视频帧(其自身具有 PTS 0)可用时启动。因此,如果音频录制比视频开始得早,音频数据在视频开始之前会被忽略(不进入编码器),如果开始得晚,那么第一个音频 PTS 将相对于整个视频的开始。
当然你可以让音频先开始,但玩家通常会跳过或等待第一个视频帧。还要注意编码的音频帧将到达 "out of order" 并且 MediaMuxer 迟早会因错误而失败。我的解决方案是像这样将它们全部排队:当有新的进入时按 pts 对它们进行排序,然后将所有早于 500 毫秒(相对于最新的)的内容写入 MediaMuxer,但只有 PTS 高于最新的那些书面框架。理想情况下,这意味着数据可以顺利写入 MediaMuxer,延迟 500 毫秒。最坏的情况是,您将丢失几个音频帧。
我想通过多路复用来自麦克风的音频(覆盖didGetAudioData)和来自相机的视频(覆盖onpreviewframe)来生成mp4文件。但是,我遇到了声音和视频同步问题,视频显示速度比音频快。我想知道问题是否与不兼容的配置或 presentationTimeUs 有关,有人可以指导我如何解决问题。下面是我的软件。
视频配置
formatVideo = MediaFormat.createVideoFormat(MIME_TYPE_VIDEO, 640, 360);
formatVideo.setInteger(MediaFormat.KEY_COLOR_FORMAT, MediaCodecInfo.CodecCapabilities.COLOR_FormatYUV420SemiPlanar);
formatVideo.setInteger(MediaFormat.KEY_BIT_RATE, 2000000);
formatVideo.setInteger(MediaFormat.KEY_FRAME_RATE, 30);
formatVideo.setInteger(MediaFormat.KEY_I_FRAME_INTERVAL, 5);
得到如下视频演示PTS,
if(generateIndex == 0) {
videoAbsolutePtsUs = 132;
StartVideoAbsolutePtsUs = System.nanoTime() / 1000L;
}else {
CurrentVideoAbsolutePtsUs = System.nanoTime() / 1000L;
videoAbsolutePtsUs =132+ CurrentVideoAbsolutePtsUs-StartVideoAbsolutePtsUs;
}
generateIndex++;
音频配置
format = MediaFormat.createAudioFormat(MIME_TYPE, 48000/*sample rate*/, AudioFormat.CHANNEL_IN_MONO /*Channel config*/);
format.setInteger(MediaFormat.KEY_AAC_PROFILE, MediaCodecInfo.CodecProfileLevel.AACObjectLC);
format.setInteger(MediaFormat.KEY_SAMPLE_RATE,48000);
format.setInteger(MediaFormat.KEY_CHANNEL_COUNT,1);
format.setInteger(MediaFormat.KEY_BIT_RATE,64000);
得到音频演示PTS如下,
if(generateIndex == 0) {
audioAbsolutePtsUs = 132;
StartAudioAbsolutePtsUs = System.nanoTime() / 1000L;
}else {
CurrentAudioAbsolutePtsUs = System.nanoTime() / 1000L;
audioAbsolutePtsUs =CurrentAudioAbsolutePtsUs - StartAudioAbsolutePtsUs;
}
generateIndex++;
audioAbsolutePtsUs = getJitterFreePTS(audioAbsolutePtsUs, audioInputLength / 2);
long startPTS = 0;
long totalSamplesNum = 0;
private long getJitterFreePTS(long bufferPts, long bufferSamplesNum) {
long correctedPts = 0;
long bufferDuration = (1000000 * bufferSamplesNum) / 48000;
bufferPts -= bufferDuration; // accounts for the delay of acquiring the audio buffer
if (totalSamplesNum == 0) {
// reset
startPTS = bufferPts;
totalSamplesNum = 0;
}
correctedPts = startPTS + (1000000 * totalSamplesNum) / 48000;
if(bufferPts - correctedPts >= 2*bufferDuration) {
// reset
startPTS = bufferPts;
totalSamplesNum = 0;
correctedPts = startPTS;
}
totalSamplesNum += bufferSamplesNum;
return correctedPts;
}
我的问题是因为仅对音频应用抖动功能引起的吗?如果是,我如何为视频应用抖动功能?我还试图通过 https://android.googlesource.com/platform/cts/+/jb-mr2-release/tests/tests/media/src/android/media/cts/EncodeDecodeTest.java 找到正确的音频和视频 presentationPTS。但是encodedecodeTest只提供了视频PTS。这就是我的实现对音频和视频使用系统纳秒的原因。如果想在encodedecodetest中使用video presentationPTS,如何构造兼容的audio presentationPTS?感谢帮助!
以下是我如何将 yuv 帧排队到视频媒体编解码器以供参考。对于音频部分,除了不同的presentationPTS外,其他都是一样的。
int videoInputBufferIndex;
int videoInputLength;
long videoAbsolutePtsUs;
long StartVideoAbsolutePtsUs, CurrentVideoAbsolutePtsUs;
int put_v =0;
int get_v =0;
int generateIndex = 0;
public void setByteBufferVideo(byte[] buffer, boolean isUsingFrontCamera, boolean Input_endOfStream){
if(Build.VERSION.SDK_INT >=18){
try{
endOfStream = Input_endOfStream;
if(!Input_endOfStream){
ByteBuffer[] inputBuffers = mVideoCodec.getInputBuffers();
videoInputBufferIndex = mVideoCodec.dequeueInputBuffer(-1);
if (VERBOSE) {
Log.w(TAG,"[put_v]:"+(put_v)+"; videoInputBufferIndex = "+videoInputBufferIndex+"; endOfStream = "+endOfStream);
}
if(videoInputBufferIndex>=0) {
ByteBuffer inputBuffer = inputBuffers[videoInputBufferIndex];
inputBuffer.clear();
inputBuffer.put(mNV21Convertor.convert(buffer));
videoInputLength = buffer.length;
if(generateIndex == 0) {
videoAbsolutePtsUs = 132;
StartVideoAbsolutePtsUs = System.nanoTime() / 1000L;
}else {
CurrentVideoAbsolutePtsUs = System.nanoTime() / 1000L;
videoAbsolutePtsUs =132+ CurrentVideoAbsolutePtsUs - StartVideoAbsolutePtsUs;
}
generateIndex++;
if (VERBOSE) {
Log.w(TAG, "[put_v]:"+(put_v)+"; videoAbsolutePtsUs = " + videoAbsolutePtsUs + "; CurrentVideoAbsolutePtsUs = "+CurrentVideoAbsolutePtsUs);
}
if (videoInputLength == AudioRecord.ERROR_INVALID_OPERATION) {
Log.w(TAG, "[put_v]ERROR_INVALID_OPERATION");
} else if (videoInputLength == AudioRecord.ERROR_BAD_VALUE) {
Log.w(TAG, "[put_v]ERROR_ERROR_BAD_VALUE");
}
if (endOfStream) {
Log.w(TAG, "[put_v]:"+(put_v++)+"; [get] receive endOfStream");
mVideoCodec.queueInputBuffer(videoInputBufferIndex, 0, videoInputLength, videoAbsolutePtsUs, MediaCodec.BUFFER_FLAG_END_OF_STREAM);
} else {
Log.w(TAG, "[put_v]:"+(put_v++)+"; receive videoInputLength :" + videoInputLength);
mVideoCodec.queueInputBuffer(videoInputBufferIndex, 0, videoInputLength, videoAbsolutePtsUs, 0);
}
}
}
}catch (Exception x) {
x.printStackTrace();
}
}
}
我在我的应用程序中解决这个问题的方法是针对共享 "sync clock" 设置所有视频和音频帧的 PTS(注意 sync 也意味着它是线程- safe) 在第一个视频帧(其自身具有 PTS 0)可用时启动。因此,如果音频录制比视频开始得早,音频数据在视频开始之前会被忽略(不进入编码器),如果开始得晚,那么第一个音频 PTS 将相对于整个视频的开始。
当然你可以让音频先开始,但玩家通常会跳过或等待第一个视频帧。还要注意编码的音频帧将到达 "out of order" 并且 MediaMuxer 迟早会因错误而失败。我的解决方案是像这样将它们全部排队:当有新的进入时按 pts 对它们进行排序,然后将所有早于 500 毫秒(相对于最新的)的内容写入 MediaMuxer,但只有 PTS 高于最新的那些书面框架。理想情况下,这意味着数据可以顺利写入 MediaMuxer,延迟 500 毫秒。最坏的情况是,您将丢失几个音频帧。