MFCC produces "ValueError: index can't contain negative values" for parsing wav file
MFCC produces "ValueError: index can't contain negative values" for parsing wav file
关于使用通用代码提取缩放的 MFCC 数据:
def extract_features(file_name):
try:
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
mfccsscaled = np.mean(mfccs.T,axis=0)
except Exception as e:
print("Error encountered while parsing file: ", file)
return None
return mfccsscaled
用于单个文件的示例代码:
max_pad_len = 174
file_name = '201-AWCKARAK47Close0116BIT.wav'
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast', sr=None)
mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
pad_width = max_pad_len - mfccs.shape[1]
mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant')
mfccsscaled
我收到以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-26-118328675a5f> in <module>
4 mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
5 pad_width = max_pad_len - mfccs.shape[1]
----> 6 mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant')
7 mfccsscaled
<__array_function__ internals> in pad(*args, **kwargs)
c:\python\lib\site-packages\numpy\lib\arraypad.py in pad(array, pad_width, mode, **kwargs)
746
747 # Broadcast to shape (array.ndim, 2)
--> 748 pad_width = _as_pairs(pad_width, array.ndim, as_index=True)
749
750 if callable(mode):
c:\python\lib\site-packages\numpy\lib\arraypad.py in _as_pairs(x, ndim, as_index)
517
518 if as_index and x.min() < 0:
--> 519 raise ValueError("index can't contain negative values")
520
521 # Converting the array with `tolist` seems to improve performance
ValueError: index can't contain negative values
你能告诉我为什么会抛出这个错误以及如何解决它吗?
背景
我正在使用从 https://www.boomlibrary.com/ 获得的文件。大多数文件都是 24 位深度。我尝试对原始 wav 文件进行下采样(至 16 位)和上采样(至 32 位)。即使通过 librosa 传递这两个文件,min~max 数据也不符合 [-1,1]。我得到 Librosa audio file min~max range: -1.2105241 to 1.2942984
。不确定这些数据是否有助于解决我的问题。谢谢!
如异常所示,您正在用负值填充。
问题源于这一行:
pad_width = max_pad_len - mfccs.shape[1]
mfccs.shape[1]
与音频长度成正比,并取决于用于计算 mfcc
的跳跃长度。默认情况下 hop_length
是 512。
有问题的音频是 201-AWCKARAK47Close0116BIT.wav
,一个以 96kHz 采样的大约 45 秒长的剪辑。粗略计算告诉我们,您将为该音频文件获得的 MFCC 数量为:
45 second * (96000 samples / second) / 512 samples ~ 8500
依次为:
pad_width = max_pad_len - mfccs.shape[1] = 174 - 8500 => NEGATIVE NUMBER
关于使用通用代码提取缩放的 MFCC 数据:
def extract_features(file_name):
try:
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast')
mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
mfccsscaled = np.mean(mfccs.T,axis=0)
except Exception as e:
print("Error encountered while parsing file: ", file)
return None
return mfccsscaled
用于单个文件的示例代码:
max_pad_len = 174
file_name = '201-AWCKARAK47Close0116BIT.wav'
audio, sample_rate = librosa.load(file_name, res_type='kaiser_fast', sr=None)
mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
pad_width = max_pad_len - mfccs.shape[1]
mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant')
mfccsscaled
我收到以下错误:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-26-118328675a5f> in <module>
4 mfccs = librosa.feature.mfcc(y=audio, sr=sample_rate, n_mfcc=40)
5 pad_width = max_pad_len - mfccs.shape[1]
----> 6 mfccs = np.pad(mfccs, pad_width=((0, 0), (0, pad_width)), mode='constant')
7 mfccsscaled
<__array_function__ internals> in pad(*args, **kwargs)
c:\python\lib\site-packages\numpy\lib\arraypad.py in pad(array, pad_width, mode, **kwargs)
746
747 # Broadcast to shape (array.ndim, 2)
--> 748 pad_width = _as_pairs(pad_width, array.ndim, as_index=True)
749
750 if callable(mode):
c:\python\lib\site-packages\numpy\lib\arraypad.py in _as_pairs(x, ndim, as_index)
517
518 if as_index and x.min() < 0:
--> 519 raise ValueError("index can't contain negative values")
520
521 # Converting the array with `tolist` seems to improve performance
ValueError: index can't contain negative values
你能告诉我为什么会抛出这个错误以及如何解决它吗?
背景
我正在使用从 https://www.boomlibrary.com/ 获得的文件。大多数文件都是 24 位深度。我尝试对原始 wav 文件进行下采样(至 16 位)和上采样(至 32 位)。即使通过 librosa 传递这两个文件,min~max 数据也不符合 [-1,1]。我得到 Librosa audio file min~max range: -1.2105241 to 1.2942984
。不确定这些数据是否有助于解决我的问题。谢谢!
如异常所示,您正在用负值填充。
问题源于这一行:
pad_width = max_pad_len - mfccs.shape[1]
mfccs.shape[1]
与音频长度成正比,并取决于用于计算 mfcc
的跳跃长度。默认情况下 hop_length
是 512。
有问题的音频是 201-AWCKARAK47Close0116BIT.wav
,一个以 96kHz 采样的大约 45 秒长的剪辑。粗略计算告诉我们,您将为该音频文件获得的 MFCC 数量为:
45 second * (96000 samples / second) / 512 samples ~ 8500
依次为:
pad_width = max_pad_len - mfccs.shape[1] = 174 - 8500 => NEGATIVE NUMBER