我的 Monte Carlo dropout 模型是否应该提供类似于确定性预测的平均预测?

Should my model with Monte Carlo dropout provide a mean prediction similar to the deterministic prediction?

我有一个用多个 LayerNormalization 层训练的模型,我不确定在激活 dropout 进行预测时简单的权重转移是否正常工作。这是我使用的代码:

from tensorflow.keras.models import load_model, Model
from tensorflow.keras.layers import Dense, Dropout, LayerNormalization, Input

model0 = load_model(path + 'model0.h5')
OW = model0.get_weights()

inp = Input(shape=(10,))
D1 = Dense(760, activation='softplus')(inp)
DO1 = Dropout(0.29)(D1,training=True)
N1 = LayerNormalization()(DO1)
D2 = Dense(460,activation='softsign')(N1)
DO2 = Dropout(0.16)(D2,training=True)
N2 = LayerNormalization()(DO2)
D3 = Dense(664,activation='softsign')(N2)
DO3 = Dropout(0.09)(D3,training=True)
N3 = LayerNormalization()(DO3)
out = Dense(1,activation='linear')(N3)

mP = Model(inp,out)
mP.set_weights(OW)
mP.compile(loss='mse',optimizer='Adam')
mP.save(path + 'new_model.h5')

如果我在 dropout 层上设置 training=False,模型会做出与原始模型相同的预测。但是,当代码如上所示时,平均预测并不接近 original/deterministic 预测。

我之前开发的模型使用 dropout 设置为训练,平均概率预测与确定性模型几乎相同。是不是我做错了什么,或者这是使用 LayerNormalization 和 active dropout 的问题?据我所知,LayerNormalization 有可训练的参数,所以我不知道 active dropout 是否会干扰它。如果是这样,我不知道如何补救。

这段代码用于运行快速测试和绘制结果:

inputs = np.zeros(shape=(1,10),dtype='float32')
inputsP = np.zeros(shape=(1000,10),dtype='float32')
opD = mD.predict(inputs)[0,0]
opP = mP.predict(inputsP).reshape(1000)
print('Deterministic: %.4f   Probabilistic: %.4f' % (opD,np.mean(opP)))

plt.scatter(0,opD,color='black',label='Det',zorder=3)
plt.scatter(0,np.mean(opP),color='red',label='Mean prob',zorder=2)
plt.errorbar(0,np.mean(opP),yerr=np.std(opP),color='red',zorder=2,markersize=0, capsize=20,label=r'$\sigma$ bounds')
plt.grid(axis='y',zorder=0)
plt.legend()
plt.tick_params(axis='x',labelsize=0,labelcolor='white',color='white',width=0,length=0)

结果输出和绘图如下所示。

Deterministic: -0.9732 Probabilistic: -0.9011

编辑我的回答:

我认为问题只是模型采样不足。预测的标准偏差与丢失率直接相关,因此近似确定性模型所需的预测数量也会增加。如果您对下面的代码进行荒谬的测试,但将每个 dropout 层的 dropout 设置为 0.7,则 100,000 个样本不再足以将确定性均值逼近到 10^-3 以内,并且预测的标准偏差会变得更大。

import os

import numpy as np
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Dense, Dropout, Input

os.environ['CUDA_VISIBLE_DEVICES'] = '0'
GPUs = tf.config.experimental.list_physical_devices('GPU')
for gpu in GPUs:
    tf.config.experimental.set_memory_growth(gpu, True)

inp = Input(shape=(10,))
D1 = Dense(760, activation='softplus')(inp)
D2 = Dense(460, activation='softsign')(D1)
D3 = Dense(664, activation='softsign')(D2)
out = Dense(1, activation='linear')(D3)

mP = Model(inp, out)
mP.compile(loss='mse', optimizer='Adam')

inp = Input(shape=(10,))
D1 = Dense(760, activation='softplus')(inp)
DO1 = Dropout(0.29)(D1,training=False)
D2 = Dense(460, activation='softsign')(DO1)
DO2 = Dropout(0.16)(D2,training=True)
D3 = Dense(664, activation='softsign')(DO2)
DO3 = Dropout(0.09)(D3,training=True)
out = Dense(1, activation='linear')(DO3)

mP2 = Model(inp, out)
mP2.set_weights(mP.get_weights())
mP2.compile(loss='mse', optimizer='Adam')

data = np.zeros(shape=(100000, 10),dtype='float32')
res = mP.predict(data).reshape(data.shape[0])
res2 = mP2.predict(data).reshape(data.shape[0])

print (np.abs(res[0] - res2.mean()))