以原始流播放音频时发出刺耳的白色声音
Screeching white sound coming while playing audio as a raw stream
我。背景
- 我正在尝试制作一个应用程序,它有助于在波形级别、单词级别甚至字符级别非常准确地将字幕与音频波形匹配。
- 音频预计是梵语圣歌(瑜伽、仪式等),它们是非常长的复合词[示例 - aṅganyā-sokta-mātaro-bījam 传统上是一个词,只是为了帮助阅读而被打破]
- 输入的文字记录/字幕可能在 sentence/verse 级别大致同步,但肯定不会在单词级别同步。
- 应用程序应该能够找出音频波形中的静音点,以便它可以猜测每个单词(甚至 letter/consonant/vowel 个单词)的起点和终点,这样单词级别(甚至 letter/consonant/vowel 级别)的音频和视觉字幕完美匹配,相应的 UI 只是突出显示或动画显示字幕行中的确切单词(甚至字母)在那一刻被吟唱,并以更大的字体显示那个词(甚至 letter/consonant/vowel)。此应用程序的目的是帮助学习梵文诵经。
- 预计不会是 100% 的自动化过程,也不是 100% 的手动过程,而是应用程序应尽可能帮助人类的混合过程。
二.以下是我为此写的第一段代码,其中
- 首先我打开一个 mp3(或任何音频格式)文件,
- 寻找音频文件时间轴中的任意点 // 从零偏移开始播放
- 获取原始格式的音频数据有两个目的 - (1) 播放它和 (2) 绘制波形。
- 使用标准 java 音频库播放原始音频数据
三.我面临的问题是,在每个周期之间都有刺耳的声音。
- 可能我需要关闭周期之间的线?听起来很简单,我可以试试
- 但我也想知道这种整体做法本身是否正确?任何提示、指南、建议 link 都会很有帮助。
- 另外,我只是对采样率等(44100Hz 等)进行了硬编码,这些是否适合设置为默认预设,还是应该取决于输入格式?
四.这是代码
import com.github.kokorin.jaffree.StreamType;
import com.github.kokorin.jaffree.ffmpeg.FFmpeg;
import com.github.kokorin.jaffree.ffmpeg.FFmpegProgress;
import com.github.kokorin.jaffree.ffmpeg.FFmpegResult;
import com.github.kokorin.jaffree.ffmpeg.NullOutput;
import com.github.kokorin.jaffree.ffmpeg.PipeOutput;
import com.github.kokorin.jaffree.ffmpeg.ProgressListener;
import com.github.kokorin.jaffree.ffprobe.Stream;
import com.github.kokorin.jaffree.ffmpeg.UrlInput;
import com.github.kokorin.jaffree.ffprobe.FFprobe;
import com.github.kokorin.jaffree.ffprobe.FFprobeResult;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.SourceDataLine;
public class FFMpegToRaw {
Path BIN = Paths.get("f:\utilities\ffmpeg-20190413-0ad0533-win64-static\bin");
String VIDEO_MP4 = "f:\org\TEMPLE\DeviMahatmyamRecitationAudio\03_01_Devi Kavacham.mp3";
FFprobe ffprobe;
FFmpeg ffmpeg;
public void basicCheck() throws Exception {
if (BIN != null) {
ffprobe = FFprobe.atPath(BIN);
} else {
ffprobe = FFprobe.atPath();
}
FFprobeResult result = ffprobe
.setShowStreams(true)
.setInput(VIDEO_MP4)
.execute();
for (Stream stream : result.getStreams()) {
System.out.println("Stream " + stream.getIndex()
+ " type " + stream.getCodecType()
+ " duration " + stream.getDuration(TimeUnit.SECONDS));
}
if (BIN != null) {
ffmpeg = FFmpeg.atPath(BIN);
} else {
ffmpeg = FFmpeg.atPath();
}
//Sometimes ffprobe can't show exact duration, use ffmpeg trancoding to NULL output to get it
final AtomicLong durationMillis = new AtomicLong();
FFmpegResult fFmpegResult = ffmpeg
.addInput(
UrlInput.fromUrl(VIDEO_MP4)
)
.addOutput(new NullOutput())
.setProgressListener(new ProgressListener() {
@Override
public void onProgress(FFmpegProgress progress) {
durationMillis.set(progress.getTimeMillis());
}
})
.execute();
System.out.println("audio size - "+fFmpegResult.getAudioSize());
System.out.println("Exact duration: " + durationMillis.get() + " milliseconds");
}
public void toRawAndPlay() throws Exception {
ProgressListener listener = new ProgressListener() {
@Override
public void onProgress(FFmpegProgress progress) {
System.out.println(progress.getFrame());
}
};
// code derived from :
int sampleRate = 44100;//24000;//Hz
int sampleSize = 16;//Bits
int channels = 1;
boolean signed = true;
boolean bigEnd = false;
String format = "s16be"; //"f32le"
//https://trac.ffmpeg.org/wiki/audio types
final AudioFormat af = new AudioFormat(sampleRate, sampleSize, channels, signed, bigEnd);
final DataLine.Info info = new DataLine.Info(SourceDataLine.class, af);
final SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);
line.open(af, 4096); // format , buffer size
line.start();
OutputStream destination = new OutputStream() {
@Override public void write(int b) throws IOException {
throw new UnsupportedOperationException("Nobody uses thi.");
}
@Override public void write(byte[] b, int off, int len) throws IOException {
String o = new String(b);
boolean showString = false;
System.out.println("New output ("+ len
+ ", off="+off + ") -> "+(showString?o:""));
// output wave form repeatedly
if(len%2!=0) {
len -= 1;
System.out.println("");
}
line.write(b, off, len);
System.out.println("done round");
}
};
// src : http://blog.wudilabs.org/entry/c3d357ed/?lang=en-US
FFmpegResult result = FFmpeg.atPath(BIN).
addInput(UrlInput.fromPath(Paths.get(VIDEO_MP4))).
addOutput(PipeOutput.pumpTo(destination).
disableStream(StreamType.VIDEO). //.addArgument("-vn")
setFrameRate(sampleRate). //.addArguments("-ar", sampleRate)
addArguments("-ac", "1").
setFormat(format) //.addArguments("-f", format)
).
setProgressListener(listener).
execute();
// shut down audio
line.drain();
line.stop();
line.close();
System.out.println("result = "+result.toString());
}
public static void main(String[] args) throws Exception {
FFMpegToRaw raw = new FFMpegToRaw();
raw.basicCheck();
raw.toRawAndPlay();
}
}
谢谢
我怀疑你的尖叫声来自传递给音频系统的半满缓冲区。
如上面的评论所示,我会使用 FFSampledSP 之类的东西(如果在 mac 或 Windows 上),然后使用如下代码,这比 java-esque.
只需确保 FFSampledSP 完整 jar 在您的路径中,您就可以开始了。
import javax.sound.sampled.*;
import java.io.File;
import java.io.IOException;
public class PlayerDemo {
/**
* Derive a PCM format.
*/
private static AudioFormat toSignedPCM(final AudioFormat format) {
final int sampleSizeInBits = format.getSampleSizeInBits() <= 0 ? 16 : format.getSampleSizeInBits();
final int channels = format.getChannels() <= 0 ? 2 : format.getChannels();
final float sampleRate = format.getSampleRate() <= 0 ? 44100f : format.getSampleRate();
return new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
sampleRate,
sampleSizeInBits,
channels,
(sampleSizeInBits > 0 && channels > 0) ? (sampleSizeInBits/8)*channels : AudioSystem.NOT_SPECIFIED,
sampleRate,
format.isBigEndian()
);
}
public static void main(final String[] args) throws IOException, UnsupportedAudioFileException, LineUnavailableException {
final File audioFile = new File(args[0]);
// open mp3 or whatever
final Long durationInMicroseconds = (Long)AudioSystem.getAudioFileFormat(audioFile).getProperty("duration");
// how long is the file, use AudioFileFormat properties
System.out.println("Duration in microseconds (not millis!): " + durationInMicroseconds);
// open the mp3 stream (not yet decoded)
final AudioInputStream mp3In = AudioSystem.getAudioInputStream(audioFile);
// derive a suitable PCM format that can be played by the AudioSystem
final AudioFormat desiredFormat = toSignedPCM(mp3In.getFormat());
// ask the AudioSystem for a source line for playback
// that corresponds to the derived PCM format
final SourceDataLine line = AudioSystem.getSourceDataLine(desiredFormat);
// now play, typically in separate thread
new Thread(() -> {
final byte[] buf = new byte[4096];
int justRead;
// convert to raw PCM samples with the AudioSystem
try (final AudioInputStream rawIn = AudioSystem.getAudioInputStream(desiredFormat, mp3In)) {
line.open();
line.start();
while ((justRead = rawIn.read(buf)) >= 0) {
// only write bytes we really read, not more!
line.write(buf, 0, justRead);
final long microsecondPosition = line.getMicrosecondPosition();
System.out.println("Current position in microseconds: " + microsecondPosition);
}
} catch (IOException | LineUnavailableException e) {
e.printStackTrace();
} finally {
line.drain();
line.stop();
}
}).start();
}
}
常规Java API不允许跳转到任意位置。但是,FFSampledSP 包含一个扩展,即 seek() 方法。要使用它,只需将上面示例中的 rawIn
转换为 FFAudioInputStream
并使用 time
和 timeUnit
.
调用 seek()
我。背景
- 我正在尝试制作一个应用程序,它有助于在波形级别、单词级别甚至字符级别非常准确地将字幕与音频波形匹配。
- 音频预计是梵语圣歌(瑜伽、仪式等),它们是非常长的复合词[示例 - aṅganyā-sokta-mātaro-bījam 传统上是一个词,只是为了帮助阅读而被打破]
- 输入的文字记录/字幕可能在 sentence/verse 级别大致同步,但肯定不会在单词级别同步。
- 应用程序应该能够找出音频波形中的静音点,以便它可以猜测每个单词(甚至 letter/consonant/vowel 个单词)的起点和终点,这样单词级别(甚至 letter/consonant/vowel 级别)的音频和视觉字幕完美匹配,相应的 UI 只是突出显示或动画显示字幕行中的确切单词(甚至字母)在那一刻被吟唱,并以更大的字体显示那个词(甚至 letter/consonant/vowel)。此应用程序的目的是帮助学习梵文诵经。
- 预计不会是 100% 的自动化过程,也不是 100% 的手动过程,而是应用程序应尽可能帮助人类的混合过程。
二.以下是我为此写的第一段代码,其中
- 首先我打开一个 mp3(或任何音频格式)文件,
- 寻找音频文件时间轴中的任意点 // 从零偏移开始播放
- 获取原始格式的音频数据有两个目的 - (1) 播放它和 (2) 绘制波形。
- 使用标准 java 音频库播放原始音频数据
三.我面临的问题是,在每个周期之间都有刺耳的声音。
- 可能我需要关闭周期之间的线?听起来很简单,我可以试试
- 但我也想知道这种整体做法本身是否正确?任何提示、指南、建议 link 都会很有帮助。
- 另外,我只是对采样率等(44100Hz 等)进行了硬编码,这些是否适合设置为默认预设,还是应该取决于输入格式?
四.这是代码
import com.github.kokorin.jaffree.StreamType;
import com.github.kokorin.jaffree.ffmpeg.FFmpeg;
import com.github.kokorin.jaffree.ffmpeg.FFmpegProgress;
import com.github.kokorin.jaffree.ffmpeg.FFmpegResult;
import com.github.kokorin.jaffree.ffmpeg.NullOutput;
import com.github.kokorin.jaffree.ffmpeg.PipeOutput;
import com.github.kokorin.jaffree.ffmpeg.ProgressListener;
import com.github.kokorin.jaffree.ffprobe.Stream;
import com.github.kokorin.jaffree.ffmpeg.UrlInput;
import com.github.kokorin.jaffree.ffprobe.FFprobe;
import com.github.kokorin.jaffree.ffprobe.FFprobeResult;
import java.io.IOException;
import java.io.OutputStream;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicLong;
import javax.sound.sampled.AudioFormat;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.DataLine;
import javax.sound.sampled.SourceDataLine;
public class FFMpegToRaw {
Path BIN = Paths.get("f:\utilities\ffmpeg-20190413-0ad0533-win64-static\bin");
String VIDEO_MP4 = "f:\org\TEMPLE\DeviMahatmyamRecitationAudio\03_01_Devi Kavacham.mp3";
FFprobe ffprobe;
FFmpeg ffmpeg;
public void basicCheck() throws Exception {
if (BIN != null) {
ffprobe = FFprobe.atPath(BIN);
} else {
ffprobe = FFprobe.atPath();
}
FFprobeResult result = ffprobe
.setShowStreams(true)
.setInput(VIDEO_MP4)
.execute();
for (Stream stream : result.getStreams()) {
System.out.println("Stream " + stream.getIndex()
+ " type " + stream.getCodecType()
+ " duration " + stream.getDuration(TimeUnit.SECONDS));
}
if (BIN != null) {
ffmpeg = FFmpeg.atPath(BIN);
} else {
ffmpeg = FFmpeg.atPath();
}
//Sometimes ffprobe can't show exact duration, use ffmpeg trancoding to NULL output to get it
final AtomicLong durationMillis = new AtomicLong();
FFmpegResult fFmpegResult = ffmpeg
.addInput(
UrlInput.fromUrl(VIDEO_MP4)
)
.addOutput(new NullOutput())
.setProgressListener(new ProgressListener() {
@Override
public void onProgress(FFmpegProgress progress) {
durationMillis.set(progress.getTimeMillis());
}
})
.execute();
System.out.println("audio size - "+fFmpegResult.getAudioSize());
System.out.println("Exact duration: " + durationMillis.get() + " milliseconds");
}
public void toRawAndPlay() throws Exception {
ProgressListener listener = new ProgressListener() {
@Override
public void onProgress(FFmpegProgress progress) {
System.out.println(progress.getFrame());
}
};
// code derived from :
int sampleRate = 44100;//24000;//Hz
int sampleSize = 16;//Bits
int channels = 1;
boolean signed = true;
boolean bigEnd = false;
String format = "s16be"; //"f32le"
//https://trac.ffmpeg.org/wiki/audio types
final AudioFormat af = new AudioFormat(sampleRate, sampleSize, channels, signed, bigEnd);
final DataLine.Info info = new DataLine.Info(SourceDataLine.class, af);
final SourceDataLine line = (SourceDataLine) AudioSystem.getLine(info);
line.open(af, 4096); // format , buffer size
line.start();
OutputStream destination = new OutputStream() {
@Override public void write(int b) throws IOException {
throw new UnsupportedOperationException("Nobody uses thi.");
}
@Override public void write(byte[] b, int off, int len) throws IOException {
String o = new String(b);
boolean showString = false;
System.out.println("New output ("+ len
+ ", off="+off + ") -> "+(showString?o:""));
// output wave form repeatedly
if(len%2!=0) {
len -= 1;
System.out.println("");
}
line.write(b, off, len);
System.out.println("done round");
}
};
// src : http://blog.wudilabs.org/entry/c3d357ed/?lang=en-US
FFmpegResult result = FFmpeg.atPath(BIN).
addInput(UrlInput.fromPath(Paths.get(VIDEO_MP4))).
addOutput(PipeOutput.pumpTo(destination).
disableStream(StreamType.VIDEO). //.addArgument("-vn")
setFrameRate(sampleRate). //.addArguments("-ar", sampleRate)
addArguments("-ac", "1").
setFormat(format) //.addArguments("-f", format)
).
setProgressListener(listener).
execute();
// shut down audio
line.drain();
line.stop();
line.close();
System.out.println("result = "+result.toString());
}
public static void main(String[] args) throws Exception {
FFMpegToRaw raw = new FFMpegToRaw();
raw.basicCheck();
raw.toRawAndPlay();
}
}
谢谢
我怀疑你的尖叫声来自传递给音频系统的半满缓冲区。
如上面的评论所示,我会使用 FFSampledSP 之类的东西(如果在 mac 或 Windows 上),然后使用如下代码,这比 java-esque.
只需确保 FFSampledSP 完整 jar 在您的路径中,您就可以开始了。
import javax.sound.sampled.*;
import java.io.File;
import java.io.IOException;
public class PlayerDemo {
/**
* Derive a PCM format.
*/
private static AudioFormat toSignedPCM(final AudioFormat format) {
final int sampleSizeInBits = format.getSampleSizeInBits() <= 0 ? 16 : format.getSampleSizeInBits();
final int channels = format.getChannels() <= 0 ? 2 : format.getChannels();
final float sampleRate = format.getSampleRate() <= 0 ? 44100f : format.getSampleRate();
return new AudioFormat(AudioFormat.Encoding.PCM_SIGNED,
sampleRate,
sampleSizeInBits,
channels,
(sampleSizeInBits > 0 && channels > 0) ? (sampleSizeInBits/8)*channels : AudioSystem.NOT_SPECIFIED,
sampleRate,
format.isBigEndian()
);
}
public static void main(final String[] args) throws IOException, UnsupportedAudioFileException, LineUnavailableException {
final File audioFile = new File(args[0]);
// open mp3 or whatever
final Long durationInMicroseconds = (Long)AudioSystem.getAudioFileFormat(audioFile).getProperty("duration");
// how long is the file, use AudioFileFormat properties
System.out.println("Duration in microseconds (not millis!): " + durationInMicroseconds);
// open the mp3 stream (not yet decoded)
final AudioInputStream mp3In = AudioSystem.getAudioInputStream(audioFile);
// derive a suitable PCM format that can be played by the AudioSystem
final AudioFormat desiredFormat = toSignedPCM(mp3In.getFormat());
// ask the AudioSystem for a source line for playback
// that corresponds to the derived PCM format
final SourceDataLine line = AudioSystem.getSourceDataLine(desiredFormat);
// now play, typically in separate thread
new Thread(() -> {
final byte[] buf = new byte[4096];
int justRead;
// convert to raw PCM samples with the AudioSystem
try (final AudioInputStream rawIn = AudioSystem.getAudioInputStream(desiredFormat, mp3In)) {
line.open();
line.start();
while ((justRead = rawIn.read(buf)) >= 0) {
// only write bytes we really read, not more!
line.write(buf, 0, justRead);
final long microsecondPosition = line.getMicrosecondPosition();
System.out.println("Current position in microseconds: " + microsecondPosition);
}
} catch (IOException | LineUnavailableException e) {
e.printStackTrace();
} finally {
line.drain();
line.stop();
}
}).start();
}
}
常规Java API不允许跳转到任意位置。但是,FFSampledSP 包含一个扩展,即 seek() 方法。要使用它,只需将上面示例中的 rawIn
转换为 FFAudioInputStream
并使用 time
和 timeUnit
.
seek()