以原始流播放音频时发出刺耳的白色声音

Question

我。背景

我正在尝试制作一个应用程序，它有助于在波形级别、单词级别甚至字符级别非常准确地将字幕与音频波形匹配。
音频预计是梵语圣歌（瑜伽、仪式等），它们是非常长的复合词[示例 - aṅganyā-sokta-mātaro-bījam 传统上是一个词，只是为了帮助阅读而被打破]
输入的文字记录/字幕可能在 sentence/verse 级别大致同步，但肯定不会在单词级别同步。
应用程序应该能够找出音频波形中的静音点，以便它可以猜测每个单词（甚至 letter/consonant/vowel 个单词）的起点和终点，这样单词级别（甚至 letter/consonant/vowel 级别）的音频和视觉字幕完美匹配，相应的 UI 只是突出显示或动画显示字幕行中的确切单词（甚至字母）在那一刻被吟唱，并以更大的字体显示那个词（甚至 letter/consonant/vowel）。此应用程序的目的是帮助学习梵文诵经。
预计不会是 100% 的自动化过程，也不是 100% 的手动过程，而是应用程序应尽可能帮助人类的混合过程。

二．以下是我为此写的第一段代码，其中

首先我打开一个 mp3（或任何音频格式）文件，
寻找音频文件时间轴中的任意点 // 从零偏移开始播放
获取原始格式的音频数据有两个目的 - (1) 播放它和 (2) 绘制波形。
使用标准 java 音频库播放原始音频数据

三．我面临的问题是，在每个周期之间都有刺耳的声音。

可能我需要关闭周期之间的线？听起来很简单，我可以试试
但我也想知道这种整体做法本身是否正确？任何提示、指南、建议 link 都会很有帮助。
另外，我只是对采样率等（44100Hz 等）进行了硬编码，这些是否适合设置为默认预设，还是应该取决于输入格式？

四．这是代码

import com.github.kokorin.jaffree.StreamType;
import com.github.kokorin.jaffree.ffmpeg.FFmpeg;
import com.github.kokorin.jaffree.ffmpeg.FFmpegProgress;
import com.github.kokorin.jaffree.ffmpeg.FFmpegResult;
import com.github.kokorin.jaffree.ffmpeg.NullOutput;
import com.github.kokorin.jaffree.ffmpeg.PipeOutput;
import com.github.kokorin.jaffree.ffmpeg.ProgressListener;
import com.github.kokorin.jaffree.ffprobe.Stream;
import com.github.kokorin.jaffree.ffmpeg.UrlInput;
import com.github.kokorin.jaffree.ffprobe.FFprobe;
import com.github.kokorin.jaffree.ffprobe.FFprobeResult;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.SourceDataLine;


public class FFMpegToRaw {
    Path BIN = Paths.get("f:\utilities\ffmpeg-20190413-0ad0533-win64-static\bin");
    String VIDEO_MP4 = "f:\org\TEMPLE\DeviMahatmyamRecitationAudio\03_01_Devi Kavacham.mp3";
    FFprobe ffprobe;
    FFmpeg ffmpeg;

    public void basicCheck() throws Exception {
        if (BIN != null) {
            ffprobe = FFprobe.atPath(BIN);
        } else {
            ffprobe = FFprobe.atPath();
        }
        FFprobeResult result = ffprobe
                .setShowStreams(true)
                .setInput(VIDEO_MP4)
                .execute();

        for (Stream stream : result.getStreams()) {
            System.out.println("Stream " + stream.getIndex()
                    + " type " + stream.getCodecType()
                    + " duration " + stream.getDuration(TimeUnit.SECONDS));
        }    
        if (BIN != null) {
            ffmpeg = FFmpeg.atPath(BIN);
        } else {
            ffmpeg = FFmpeg.atPath();
        }

        //Sometimes ffprobe can't show exact duration, use ffmpeg trancoding to NULL output to get it
        final AtomicLong durationMillis = new AtomicLong();
        FFmpegResult fFmpegResult = ffmpeg
                .addInput(
                        UrlInput.fromUrl(VIDEO_MP4)
                )
                .addOutput(new NullOutput())
                .setProgressListener(new ProgressListener() {
                    @Override
                    public void onProgress(FFmpegProgress progress) {
                        durationMillis.set(progress.getTimeMillis());
                    }
                })
                .execute();
        System.out.println("audio size - "+fFmpegResult.getAudioSize());
        System.out.println("Exact duration: " + durationMillis.get() + " milliseconds");
    }

    public void toRawAndPlay() throws Exception {
        ProgressListener listener = new ProgressListener() {
            @Override
            public void onProgress(FFmpegProgress progress) {
                System.out.println(progress.getFrame());
            }
        };

        // code derived from : 

        int sampleRate = 44100;//24000;//Hz
        int sampleSize = 16;//Bits
        int channels   = 1;
        boolean signed = true;
        boolean bigEnd = false;
        String format  = "s16be"; //"f32le"

        //https://trac.ffmpeg.org/wiki/audio types
        final AudioFormat af = new AudioFormat(sampleRate, sampleSize, channels, signed, bigEnd);
        final DataLine.Info info = new DataLine.Info(SourceDataLine.class, af);
        final SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);

        line.open(af, 4096); // format , buffer size
        line.start();

        OutputStream destination = new OutputStream() {
            @Override public void write(int b) throws IOException {
                throw new UnsupportedOperationException("Nobody uses thi.");
            }
            @Override public void write(byte[] b, int off, int len) throws IOException {
                String o = new String(b);
                boolean showString = false;
                System.out.println("New output ("+ len
                        + ", off="+off + ") -> "+(showString?o:"")); 
                // output wave form repeatedly

                if(len%2!=0) {
                    len -= 1;
                    System.out.println("");
                }
                line.write(b, off, len);
                System.out.println("done round");
            }
        };

        // src : http://blog.wudilabs.org/entry/c3d357ed/?lang=en-US
        FFmpegResult result = FFmpeg.atPath(BIN).
            addInput(UrlInput.fromPath(Paths.get(VIDEO_MP4))).
            addOutput(PipeOutput.pumpTo(destination).
                disableStream(StreamType.VIDEO). //.addArgument("-vn")
                setFrameRate(sampleRate).            //.addArguments("-ar", sampleRate)
                addArguments("-ac", "1").
                setFormat(format)              //.addArguments("-f", format)
            ).
            setProgressListener(listener).
            execute();

        // shut down audio
        line.drain();
        line.stop();
        line.close();

        System.out.println("result = "+result.toString());
    }

    public static void main(String[] args) throws Exception {
        FFMpegToRaw raw = new FFMpegToRaw();
        raw.basicCheck();
        raw.toRawAndPlay();
    }
}

谢谢

Answer 1

我怀疑你的尖叫声来自传递给音频系统的半满缓冲区。

如上面的评论所示，我会使用 FFSampledSP 之类的东西（如果在 mac 或 Windows 上），然后使用如下代码，这比 java-esque.

只需确保 FFSampledSP 完整 jar 在您的路径中，您就可以开始了。

import javax.sound.sampled.*;
import java.io.File;
import java.io.IOException;

public class PlayerDemo {

    /**
     * Derive a PCM format.
     */
    private static AudioFormat toSignedPCM(final AudioFormat format) {
        final int sampleSizeInBits = format.getSampleSizeInBits() <= 0 ? 16 : format.getSampleSizeInBits();
        final int channels = format.getChannels() <= 0 ? 2 : format.getChannels();
        final float sampleRate = format.getSampleRate() <= 0 ? 44100f : format.getSampleRate();
        return new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
                sampleRate,
                sampleSizeInBits,
                channels,
                (sampleSizeInBits > 0 && channels > 0) ? (sampleSizeInBits/8)*channels : AudioSystem.NOT_SPECIFIED,
                sampleRate,
                format.isBigEndian()
        );
    }


    public static void main(final String[] args) throws IOException, UnsupportedAudioFileException, LineUnavailableException {
        final File audioFile = new File(args[0]);
        // open mp3 or whatever
        final Long durationInMicroseconds = (Long)AudioSystem.getAudioFileFormat(audioFile).getProperty("duration");
        // how long is the file, use AudioFileFormat properties
        System.out.println("Duration in microseconds (not millis!): " + durationInMicroseconds);
        // open the mp3 stream (not yet decoded)
        final AudioInputStream mp3In = AudioSystem.getAudioInputStream(audioFile);
        // derive a suitable PCM format that can be played by the AudioSystem
        final AudioFormat desiredFormat = toSignedPCM(mp3In.getFormat());
        // ask the AudioSystem for a source line for playback
        // that corresponds to the derived PCM format
        final SourceDataLine line = AudioSystem.getSourceDataLine(desiredFormat);

        // now play, typically in separate thread
        new Thread(() -> {
            final byte[] buf = new byte[4096];
            int justRead;
            // convert to raw PCM samples with the AudioSystem
            try (final AudioInputStream rawIn = AudioSystem.getAudioInputStream(desiredFormat, mp3In)) {
                line.open();
                line.start();
                while ((justRead = rawIn.read(buf)) >= 0) {
                    // only write bytes we really read, not more!
                    line.write(buf, 0, justRead);
                    final long microsecondPosition = line.getMicrosecondPosition();
                    System.out.println("Current position in microseconds: " + microsecondPosition);
                }
            } catch (IOException | LineUnavailableException e) {
                e.printStackTrace();
            } finally {
                line.drain();
                line.stop();
            }
        }).start();
    }
}

常规Java API不允许跳转到任意位置。但是，FFSampledSP 包含一个扩展，即 seek() 方法。要使用它，只需将上面示例中的 rawIn 转换为 FFAudioInputStream 并使用 time 和 timeUnit.

调用 seek()

以原始流播放音频时发出刺耳的白色声音

Screeching white sound coming while playing audio as a raw stream

java

audio

ffmpeg

javasound