python 中的绘图中的音频文件长度不正确,并且在音频绘图上不正确地覆盖了注释片段
Incorrect audio file length in plot and improperly overlaid Annotation segments on audioplot in python
我正在学习本教程 (https://github.com/amsehili/audio-segmentation-by-classification-tutorial/blob/master/multiclass_audio_segmentation.ipynb) 并尝试使用我自己的训练数据和样本重新创建可视化输出。
我的音频文件时长 31 秒
:https://www.dropbox.com/s/qae2u5dnnp678my/test_hold.wav?dl=0
注释文件在这里:
https://www.dropbox.com/s/gm9uu1rjettm3qr/hold.lst?dl=0
https://www.dropbox.com/s/b6z1gt8i63c8ted/tring.lst?dl=0
我试图在 python 中绘制音频文件波形,然后在该波形顶部的注释文件中突出显示该音频中 "hold" 和 "tring" 的部分。
audacity的波形如下:
代码如下:
import wave
import pickle
import numpy as np
from sklearn.mixture import GMM
import librosa
import warnings
warnings.filterwarnings('ignore')
SAMPLING_RATE =16000
wfp = wave.open("/home/vivek/Music/test_hold.wav")
audio_data = wfp.readframes(-1)
width = wfp.getsampwidth()
wfp.close()
# data as numpy array will be used to plot signal
fmt = {1: np.int8 , 2: np.int16, 4: np.int32}
signal = np.array(np.frombuffer(audio_data, dtype=fmt[width]), dtype=np.float64)
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
pylab.rcParams['figure.figsize'] = 24, 18
def plot_signal_and_segmentation(signal, sampling_rate, segments=[]):
_time = np.arange(0., np.ceil(float(len(signal))) / sampling_rate, 1./sampling_rate )
if len(_time) > len(signal):
_time = _time[: len(signal) - len(_time)]
pylab.subplot(211)
for seg in segments:
fc = seg.get("fc", "g")
ec = seg.get("ec", "b")
lw = seg.get("lw", 2)
alpha = seg.get("alpha", 0.4)
ts = seg["timestamps"]
# plot first segmentation outside loop to show one single legend for this class
p = pylab.axvspan(ts[0][0], ts[0][1], fc=fc, ec=ec, lw=lw, alpha=alpha, label = seg.get("title", ""))
for start, end in ts[1:]:
p = pylab.axvspan(start, end, fc=fc, ec=ec, lw=lw, alpha=alpha)
pylab.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
borderaxespad=0., fontsize=22, ncol=2)
pylab.plot(_time, signal)
pylab.xlabel("Time (s)", fontsize=22)
pylab.ylabel("Signal Amplitude", fontsize=22)
pylab.show()
annotations = {}
ts = [line.rstrip("\r\n\t ").split(" ") for line in open("/home/vivek/Music/hold.lst").readlines()]
ts = [(float(t[0]), float(t[1])) for t in ts]
annotations["hold"] = {"fc" : "y", "ec" : "y", "lw" : 0, "alpha" : 0.4, "title" : "Hold", "timestamps" : ts}
ts = [line.rstrip("\r\n\t ").split(" ") for line in open("/home/vivek/Music/tring.lst").readlines()]
ts = [(float(t[0]), float(t[1])) for t in ts]
annotations["tring"] = {"fc" : "r", "ec" : "r", "lw" : 0, "alpha" : 0.9, "title" : "Tring", "timestamps" : ts}
def plot_annot():
plot_signal_and_segmentation(signal, SAMPLING_RATE,
[annotations["tring"],
annotations["hold"]])
plot_annot()
上面代码生成的图是:
如您所见,情节似乎认为文件长 90 秒,而实际上它只有 31 秒长。注释段也是错误的 overlaid/highlighted。
我做错了什么,我该如何解决?
PS:在波形中,矩形块是"tring"其余四个"trapezoidal"波形是保持音乐的区域。
这里只是一个疯狂的猜测。 Audacity 屏幕截图显示采样率为 44100。您的代码片段有一个 SAMPLE_RATE 变量初始化为 16000。如果您将原始 31 秒乘以两个速率之间的比率,则 31*44100/16000 = 85.44秒。
我正在学习本教程 (https://github.com/amsehili/audio-segmentation-by-classification-tutorial/blob/master/multiclass_audio_segmentation.ipynb) 并尝试使用我自己的训练数据和样本重新创建可视化输出。
我的音频文件时长 31 秒
:https://www.dropbox.com/s/qae2u5dnnp678my/test_hold.wav?dl=0
注释文件在这里:
https://www.dropbox.com/s/gm9uu1rjettm3qr/hold.lst?dl=0
https://www.dropbox.com/s/b6z1gt8i63c8ted/tring.lst?dl=0
我试图在 python 中绘制音频文件波形,然后在该波形顶部的注释文件中突出显示该音频中 "hold" 和 "tring" 的部分。
audacity的波形如下:
代码如下:
import wave
import pickle
import numpy as np
from sklearn.mixture import GMM
import librosa
import warnings
warnings.filterwarnings('ignore')
SAMPLING_RATE =16000
wfp = wave.open("/home/vivek/Music/test_hold.wav")
audio_data = wfp.readframes(-1)
width = wfp.getsampwidth()
wfp.close()
# data as numpy array will be used to plot signal
fmt = {1: np.int8 , 2: np.int16, 4: np.int32}
signal = np.array(np.frombuffer(audio_data, dtype=fmt[width]), dtype=np.float64)
%matplotlib inline
import matplotlib.pyplot as plt
import matplotlib.pylab as pylab
pylab.rcParams['figure.figsize'] = 24, 18
def plot_signal_and_segmentation(signal, sampling_rate, segments=[]):
_time = np.arange(0., np.ceil(float(len(signal))) / sampling_rate, 1./sampling_rate )
if len(_time) > len(signal):
_time = _time[: len(signal) - len(_time)]
pylab.subplot(211)
for seg in segments:
fc = seg.get("fc", "g")
ec = seg.get("ec", "b")
lw = seg.get("lw", 2)
alpha = seg.get("alpha", 0.4)
ts = seg["timestamps"]
# plot first segmentation outside loop to show one single legend for this class
p = pylab.axvspan(ts[0][0], ts[0][1], fc=fc, ec=ec, lw=lw, alpha=alpha, label = seg.get("title", ""))
for start, end in ts[1:]:
p = pylab.axvspan(start, end, fc=fc, ec=ec, lw=lw, alpha=alpha)
pylab.legend(bbox_to_anchor=(0., 1.02, 1., .102), loc=3,
borderaxespad=0., fontsize=22, ncol=2)
pylab.plot(_time, signal)
pylab.xlabel("Time (s)", fontsize=22)
pylab.ylabel("Signal Amplitude", fontsize=22)
pylab.show()
annotations = {}
ts = [line.rstrip("\r\n\t ").split(" ") for line in open("/home/vivek/Music/hold.lst").readlines()]
ts = [(float(t[0]), float(t[1])) for t in ts]
annotations["hold"] = {"fc" : "y", "ec" : "y", "lw" : 0, "alpha" : 0.4, "title" : "Hold", "timestamps" : ts}
ts = [line.rstrip("\r\n\t ").split(" ") for line in open("/home/vivek/Music/tring.lst").readlines()]
ts = [(float(t[0]), float(t[1])) for t in ts]
annotations["tring"] = {"fc" : "r", "ec" : "r", "lw" : 0, "alpha" : 0.9, "title" : "Tring", "timestamps" : ts}
def plot_annot():
plot_signal_and_segmentation(signal, SAMPLING_RATE,
[annotations["tring"],
annotations["hold"]])
plot_annot()
上面代码生成的图是:
如您所见,情节似乎认为文件长 90 秒,而实际上它只有 31 秒长。注释段也是错误的 overlaid/highlighted。
我做错了什么,我该如何解决?
PS:在波形中,矩形块是"tring"其余四个"trapezoidal"波形是保持音乐的区域。
这里只是一个疯狂的猜测。 Audacity 屏幕截图显示采样率为 44100。您的代码片段有一个 SAMPLE_RATE 变量初始化为 16000。如果您将原始 31 秒乘以两个速率之间的比率,则 31*44100/16000 = 85.44秒。