在 Tensorflow 模型中实现规范化
Implementing Normalization Inside Tensorflow Model
我目前正在使用 Tensorflow 库研究基于 LSTM 的基本自动编码器。目标是让自动编码器重建多变量时间序列。我有兴趣将数据的特征规范化从数据管道移动到模型内部。
目前我通过以下方式规范化数据:
normalizer = Normalization(axis=-1)
normalizer.adapt(data_train)
data_train = normalizer(data_train)
inputs = Input(shape=[None, n_inputs])
x = LSTM(4, return_sequences=True)(inputs)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(n_inputs)))(x)
model = Model(inputs, x)
按预期工作,导致可观的损失 (~1e-2) 但不在模型范围内。
根据 documentation(在“模型前或模型内部预处理数据”下),下面的代码应该等同于上面的代码片段,只是它在模型内部运行:
normalizer = Normalization(axis=-1)
normalizer.adapt(data_train)
inputs = Input(shape=[None, n_inputs])
x = normalizer(inputs)
x = LSTM(4, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(n_inputs)))(x)
model = Model(inputs, x)
然而,运行 后一种变体会导致天文数字的损失值 (~1e3) 并且测试结果也更差。因此我的问题是:我做错了什么?会不会是我误解了文档?
非常感谢任何建议!
只要规范化器在模型外部使用时仅应用于输入(即特征矩阵),这两种方法似乎会给出一致的结果:
import numpy as np
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense, LSTM, TimeDistributed
from tensorflow.keras.models import Model
from tensorflow.keras.layers.experimental.preprocessing import Normalization
np.random.seed(42)
# define the input parameters
num_samples = 100
time_steps = 10
train_size = 0.8
# generate the data
X = np.random.normal(loc=10, scale=5, size=(num_samples, time_steps, 1))
y = np.mean(X, axis=1) + np.random.normal(loc=0, scale=1, size=(num_samples, 1))
# split the data
X_train, X_test = X[:np.int(train_size * X.shape[0]), :], X[np.int(train_size * X.shape[0]):, :]
y_train, y_test = y[:np.int(train_size * y.shape[0]), :], y[np.int(train_size * y.shape[0]):, :]
# normalize the inputs inside the model
normalizer = Normalization()
normalizer.adapt(X_train)
inputs = Input(shape=[None, 1])
x = normalizer(inputs)
x = LSTM(4, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(1)))(x)
model = Model(inputs, x)
model.compile(loss='mae', optimizer='adam')
model.fit(X_train, y_train, batch_size=32, epochs=10, verbose=0)
print(model.evaluate(X_test, y_test))
# 10.704551696777344
# normalize the inputs outside the model
normalizer = Normalization()
normalizer.adapt(X_train)
X_train_normalized = normalizer(X_train)
X_test_normalized = normalizer(X_test)
inputs = Input(shape=[None, 1])
x = LSTM(4, return_sequences=True)(inputs)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(1)))(x)
model = Model(inputs, x)
model.compile(loss='mae', optimizer='adam')
model.fit(X_train_normalized, y_train, batch_size=32, epochs=10, verbose=0)
print(model.evaluate(X_test_normalized, y_test))
# 10.748750686645508
我目前正在使用 Tensorflow 库研究基于 LSTM 的基本自动编码器。目标是让自动编码器重建多变量时间序列。我有兴趣将数据的特征规范化从数据管道移动到模型内部。
目前我通过以下方式规范化数据:
normalizer = Normalization(axis=-1)
normalizer.adapt(data_train)
data_train = normalizer(data_train)
inputs = Input(shape=[None, n_inputs])
x = LSTM(4, return_sequences=True)(inputs)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(n_inputs)))(x)
model = Model(inputs, x)
按预期工作,导致可观的损失 (~1e-2) 但不在模型范围内。 根据 documentation(在“模型前或模型内部预处理数据”下),下面的代码应该等同于上面的代码片段,只是它在模型内部运行:
normalizer = Normalization(axis=-1)
normalizer.adapt(data_train)
inputs = Input(shape=[None, n_inputs])
x = normalizer(inputs)
x = LSTM(4, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(n_inputs)))(x)
model = Model(inputs, x)
然而,运行 后一种变体会导致天文数字的损失值 (~1e3) 并且测试结果也更差。因此我的问题是:我做错了什么?会不会是我误解了文档?
非常感谢任何建议!
只要规范化器在模型外部使用时仅应用于输入(即特征矩阵),这两种方法似乎会给出一致的结果:
import numpy as np
from tensorflow.keras import Input
from tensorflow.keras.layers import Dense, LSTM, TimeDistributed
from tensorflow.keras.models import Model
from tensorflow.keras.layers.experimental.preprocessing import Normalization
np.random.seed(42)
# define the input parameters
num_samples = 100
time_steps = 10
train_size = 0.8
# generate the data
X = np.random.normal(loc=10, scale=5, size=(num_samples, time_steps, 1))
y = np.mean(X, axis=1) + np.random.normal(loc=0, scale=1, size=(num_samples, 1))
# split the data
X_train, X_test = X[:np.int(train_size * X.shape[0]), :], X[np.int(train_size * X.shape[0]):, :]
y_train, y_test = y[:np.int(train_size * y.shape[0]), :], y[np.int(train_size * y.shape[0]):, :]
# normalize the inputs inside the model
normalizer = Normalization()
normalizer.adapt(X_train)
inputs = Input(shape=[None, 1])
x = normalizer(inputs)
x = LSTM(4, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(1)))(x)
model = Model(inputs, x)
model.compile(loss='mae', optimizer='adam')
model.fit(X_train, y_train, batch_size=32, epochs=10, verbose=0)
print(model.evaluate(X_test, y_test))
# 10.704551696777344
# normalize the inputs outside the model
normalizer = Normalization()
normalizer.adapt(X_train)
X_train_normalized = normalizer(X_train)
X_test_normalized = normalizer(X_test)
inputs = Input(shape=[None, 1])
x = LSTM(4, return_sequences=True)(inputs)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(2, return_sequences=True)(x)
x = LSTM(4, return_sequences=True)(x)
x = TimeDistributed((Dense(1)))(x)
model = Model(inputs, x)
model.compile(loss='mae', optimizer='adam')
model.fit(X_train_normalized, y_train, batch_size=32, epochs=10, verbose=0)
print(model.evaluate(X_test_normalized, y_test))
# 10.748750686645508