ValueError: could not broadcast input array from shape (20,590) into shape (20)

Question

我正在尝试使用声音文件的 MFCC 从 .wav 文件中提取特征。当我尝试将我的 MFCC 列表转换为 numpy 数组时出现错误。我很确定发生此错误是因为列表包含具有不同形状的 MFCC 值（但不确定如何解决该问题）。

我查看了其他 2 篇 Whosebug 帖子，但是这些帖子并没有解决我的问题，因为它们对特定任务来说过于具体。

完整的错误信息：

Traceback (most recent call last): File "/..../.../...../Batch_MFCC_Data.py", line 68, in X = np.array(MFCCs) ValueError: could not broadcast input array from shape (20,590) into shape (20)

代码示例：

all_wav_paths = glob.glob('directory_of_wav_files/**/*.wav', recursive=True)
np.random.shuffle(all_wav_paths)

MFCCs = [] #array to hold all MFCC's
labels = [] #array to hold all labels

for i, wav_path in enumerate(all_wav_paths):

    individual_MFCC = MFCC_from_wav(wav_path)
    #MFCC_from_wav() -> returns the MFCC coefficients 

    label = get_class(wav_path)
    #get_class() -> returns the label of the wav file either 0 or 1

    #add features and label to the array
    MFCCs.append(individual_MFCC)
    labels.append(label)

#Must convert the training data to a Numpy Array for 
#train_test_split and saving to local drive

X = np.array(MFCCs) #THIS LINE CRASHES WITH ABOVE ERROR

# binary encode labels
onehot_encoder = OneHotEncoder(sparse=False)
Y = onehot_encoder.fit_transform(labels)

#create train/test data
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(MFCCs, Y, test_size=0.25, random_state=0)

#saving data to local drive
np.save("LABEL_SAVE_PATH", Y)
np.save("TRAINING_DATA_SAVE_PATH", X)

这是 MFCC 数组

中 MFCC（来自 .wav 文件）形状的快照

MFCCs 数组包含以下形状：

...More above...
(20, 423) #shape of returned MFCC from one of the .wav files
(20, 457)
(20, 1757)
(20, 345)
(20, 835)
(20, 345)
(20, 687)
(20, 774)
(20, 597)
(20, 719)
(20, 1195)
(20, 433)
(20, 728)
(20, 939)
(20, 345)
(20, 1112)
(20, 345)
(20, 591)
(20, 936)
(20, 1161)
....More below....

如您所见，MFCCs 数组中的 MFCC 并不都具有相同的形状，这是因为记录的时间长度不尽相同。这就是我无法将数组转换为 numpy 数组的原因吗？如果这是问题所在，如何解决此问题以在整个 MFCC 阵列中具有相同的形状？

任何完成此操作的代码片段和建议都将不胜感激！

谢谢！

Answer 1

使用以下逻辑将数组下采样到 min_shape，即将较大的数组减少到 min_shape

min_shape = (20, 345)
MFCCs = [arr1, arr2, arr3, ...]    

for idx, arr in enumerate(MFCCs):
    MFCCs[idx] = arr[:, :min_shape[1]]

batch_arr = np.array(MFCCs)

然后您可以将这些数组堆叠在一个批处理数组中，如下面的最小示例所示：

In [33]: a1 = np.random.randn(2, 3)    
In [34]: a2 = np.random.randn(2, 5)    
In [35]: a3 = np.random.randn(2, 10)

In [36]: MFCCs = [a1, a2, a3]

In [37]: min_shape = (2, 2)

In [38]: for idx, arr in enumerate(MFCCs):
    ...:     MFCCs[idx] = arr[:, :min_shape[1]]
    ...:     

In [42]: batch_arr = np.array(MFCCs)

In [43]: batch_arr.shape
Out[43]: (3, 2, 2)

现在是第二种策略，将较小的数组上采样到 max_shape，遵循类似的逻辑，但用 zeros 或 [=15= 填充缺失值] 值随心所欲。

再一次，您可以将数组堆叠为形状为 (num_arrays, dim1, dim2) 的批处理数组；所以，对于你的情况，形状应该是 (num_wav_files, 20, max_column)

ValueError: could not broadcast input array from shape (20,590) into shape (20)

ValueError: could not broadcast input array from shape (20,590) into shape (20)

python

signal-processing

numpy

machine-learning

mfcc