如何根据自己的类型对 3D 数组的特定维度中的特征进行归一化
How to normalize features in a specific dimension of a 3D array with respect to their own type
我有一个 3D 数组 (1883,100,68) 作为 (batch,step,features)。
68个特征是energy和mfcc等完全不同的特征
我希望根据自己的类型规范化特征。
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train.reshape(X_train.shape[0], -1)).reshape(X_train.shape)
X_test = scaler.transform(X_test.reshape(X_test.shape[0], -1)).reshape(X_test.shape)
print(X_train.shape)
print(max(X_train[0][0]))
print(min(X_train[0][0]))
显然,将其转换为二维数组是行不通的,因为每个特征都针对所有 6800 个特征进行了归一化。这导致 all 100 步中的多个特征变为零。
我要找的例如,特征[0]是能量。对于一个批次,由于有 100 个步骤,因此有 100 个能量值。我希望这 100 个能量值在它们自身内归一化。
所以应该在[1,1,0],[1,2,0],[1,3,0]...[1,100,0]之间进行归一化。所有其他功能相同。
我该如何处理?
更新:
以下代码是在 sai 的帮助下生成的。
def feature_normalization(x):
batches_unrolled = np.expand_dims(np.reshape(x, (-1, x.shape[2])), axis=0)
x_normalized = (x - np.mean(batches_unrolled, axis=1, keepdims=True)) / np.std(batches_unrolled, axis=1, keepdims=True)
np.testing.assert_allclose(x_normalized[0, :, 0], (x[0, :, 0] - np.mean(x[:, :, 0])) / np.std(x[:, :, 0]))
return x_normalized
def testset_normalization(X_train,X_test):
batches_unrolled = np.expand_dims(np.reshape(X_train, (-1, x.shape[2])), axis=0)
fitted_mean = np.mean(batches_unrolled, axis=1, keepdims=True)
fitted_std = np.std(batches_unrolled, axis=1, keepdims=True)
X_test_normalized = (X_test - fitted_mean) / fitted_std
return X_test_normalized
批量对所有样本的特征进行独立归一化-
- 展开批处理样本以获得 [10(时间步长)*batch_size] x [40 个特征] 矩阵
- 获取每个特征的均值和标准差
- 对实际批处理的样本执行逐元素归一化
import numpy as np
x = np.random.random((20, 10, 40))
batches_unrolled = np.expand_dims(np.reshape(x, (-1, 40)), axis=0)
x_normalized = (x - np.mean(batches_unrolled, axis=1, keepdims=True)) / np.std(batches_unrolled, axis=1, keepdims=True)
np.testing.assert_allclose(x_normalized[0, :, 0], (x[0, :, 0] - np.mean(x[:, :, 0])) / np.std(x[:, :, 0]))
我有一个 3D 数组 (1883,100,68) 作为 (batch,step,features)。
68个特征是energy和mfcc等完全不同的特征
我希望根据自己的类型规范化特征。
scaler = StandardScaler()
X_train = scaler.fit_transform(X_train.reshape(X_train.shape[0], -1)).reshape(X_train.shape)
X_test = scaler.transform(X_test.reshape(X_test.shape[0], -1)).reshape(X_test.shape)
print(X_train.shape)
print(max(X_train[0][0]))
print(min(X_train[0][0]))
显然,将其转换为二维数组是行不通的,因为每个特征都针对所有 6800 个特征进行了归一化。这导致 all 100 步中的多个特征变为零。
我要找的例如,特征[0]是能量。对于一个批次,由于有 100 个步骤,因此有 100 个能量值。我希望这 100 个能量值在它们自身内归一化。
所以应该在[1,1,0],[1,2,0],[1,3,0]...[1,100,0]之间进行归一化。所有其他功能相同。
我该如何处理?
更新:
以下代码是在 sai 的帮助下生成的。
def feature_normalization(x):
batches_unrolled = np.expand_dims(np.reshape(x, (-1, x.shape[2])), axis=0)
x_normalized = (x - np.mean(batches_unrolled, axis=1, keepdims=True)) / np.std(batches_unrolled, axis=1, keepdims=True)
np.testing.assert_allclose(x_normalized[0, :, 0], (x[0, :, 0] - np.mean(x[:, :, 0])) / np.std(x[:, :, 0]))
return x_normalized
def testset_normalization(X_train,X_test):
batches_unrolled = np.expand_dims(np.reshape(X_train, (-1, x.shape[2])), axis=0)
fitted_mean = np.mean(batches_unrolled, axis=1, keepdims=True)
fitted_std = np.std(batches_unrolled, axis=1, keepdims=True)
X_test_normalized = (X_test - fitted_mean) / fitted_std
return X_test_normalized
批量对所有样本的特征进行独立归一化-
- 展开批处理样本以获得 [10(时间步长)*batch_size] x [40 个特征] 矩阵
- 获取每个特征的均值和标准差
- 对实际批处理的样本执行逐元素归一化
import numpy as np
x = np.random.random((20, 10, 40))
batches_unrolled = np.expand_dims(np.reshape(x, (-1, 40)), axis=0)
x_normalized = (x - np.mean(batches_unrolled, axis=1, keepdims=True)) / np.std(batches_unrolled, axis=1, keepdims=True)
np.testing.assert_allclose(x_normalized[0, :, 0], (x[0, :, 0] - np.mean(x[:, :, 0])) / np.std(x[:, :, 0]))