如何合并 MFCC

Question

我正在研究从一些音频文件中提取 MFCC 特征。我目前有的程序为每个文件提取了一系列的MFCC，参数是buffer size为1024，在一篇论文中看到如下：

The feature vectors extracted within a second of audio data are combined by computing the mean and the variance of each feature vector element (merging).

我当前的代码使用 TarsosDSP 提取 MFCC，但我不确定如何将数据拆分为 "a second of audio data" 以便合并 MFCC。

我的MFCC提取码

int sampleRate = 44100;
int bufferSize = 1024;
int bufferOverlap = 512;
inStream = new FileInputStream(path);
AudioDispatcher dispatcher = new AudioDispatcher(new UniversalAudioInputStream(inStream, new TarsosDSPAudioFormat(sampleRate, 16, 1, true, true)), bufferSize, bufferOverlap);
final MFCC mfcc = new MFCC(bufferSize, sampleRate, 13, 40, 300, 3000);
dispatcher.addAudioProcessor(mfcc);
dispatcher.addAudioProcessor(new AudioProcessor() {
    @Override
    public void processingFinished() {
        System.out.println("DONE");
    }
    @Override
    public boolean process(AudioEvent audioEvent) {
        return true;  // breakpoint here reveals MFCC data
    }
});
dispatcher.run();

缓冲区大小到底是多少？是否可以将其用于将音频分段为 windows 1 秒？有没有办法把一系列的MFCCs分成一定的时间段？

如有任何帮助，我们将不胜感激。

Answer 1

经过更多研究，我发现 this 网站清楚地显示了在 Weka 中使用 MFCC 的步骤。它显示了一些数据文件，其中包含各种统计信息，每个统计信息在 Weka 中都列为单独的属性。我相信当论文说

computing the mean and variance

它们意味着每个 MFCC 系数的均值和方差被用作组合数据文件中的属性。当我按照网站上的例子合并MFCC时，我使用了max, min, range, max position, min position, mean, standard deviation, skewness, kurtosis, quartile, and interquartile range.

为了将音频输入分成几秒，我相信 MFCC 的集合是按照作为参数输入的采样率提取的，所以如果我将它设置为 100，我将等待 100 个周期来合并 MFCC。如有不妥请指正

如何合并 MFCC

How to Merge MFCCs

java

audio

feature-extraction

mfcc

tarsosdsp

我的MFCC提取码