如何在提取 MFCC 特征后计算音频文件的时间线

Question

提取MFCC特征后计算音频文件的时间线

思路是获取MFCC样本的时间线

import librosa
import python_speech_features

audio_file = r'sample.wav'

samples,sample_rate = librosa.core.load(audio_file,sr=16000, mono= True)

timeline = np.arange(0,len(samples))/sample_rate # prints timeline of sample.wav

print(timeline)

mfcc_feat = python_speech_features.mfcc(samples, sample_rate)

Answer 1

python_speech_features.mfcc(...) 接受多个附加参数。其中之一是 winstep，它指定特征帧之间的时间量，即 mfcc 特征。默认值为 0.01s = 10ms。在其他情况下，例如librosa，这也被称为hop_length，然后在示例中指定。

要找到你的时间轴，你必须计算出特征的数量和特征率。使用 winstep=0.01，您的 features/second（您的功能或帧速率）为 100 Hz。你的帧数是len(mfcc_feat).

所以你最终会得到：

import librosa
import python_speech_features
import numpy as np

audio_file = r'sample.wav'

samples, sample_rate = librosa.core.load(audio_file, sr=16000, mono=True)

timeline = np.arange(0, len(samples))/sample_rate # prints timeline of sample.wav

print(timeline)

winstep = 0.01  # happens to be the default value
mfcc_feat = python_speech_features.mfcc(samples, sample_rate, winstep=winstep)

frame_rate = 1./winstep

timeline_mfcc = np.arange(0, len(mfcc_feat))/frame_rate
print(timeline_mfcc)

由于“帧”表示持续时间 0.01 秒，您可能希望将偏移量移动到帧的中心，即 0.005 秒。

如何在提取 MFCC 特征后计算音频文件的时间线

How to calculate the timeline of an audio file after extracting MFCC features

python

audio

audio-processing

mfcc

librosa