如何进行自定义keras层矩阵乘法

how to do custom keras layer matrix multiplication

层数:

对于最后一层,输出必须计算为( (H21*w1)*(H22*w2)*(H23*w3)),其中H21,H22,H23将是隐藏层2的输出,而w1,w2,w3将是恒定权重,而不是可训练的。那么如何为上述结果编写 lambda 函数

def product(X):
    return X[0]*X[1]

keras_model = Sequential()
keras_model.add(Dense(75, 
input_dim=75,activation='tanh',name="layer1" ))
keras_model.add(Dense(3 ,activation='tanh',name="layer2" ))
keras_model.add(Dense(1,name="layer3"))
cross1=keras_model.add(Lambda(lambda x:product,output_shape=(1,1)))([layer2,layer3])
print(cross1)        

NameError: name 'layer2' is not defined

使用函数 API 模型

inputs = Input((75,))                                         #shape (batch, 75)
output1 = Dense(75, activation='tanh',name="layer1" )(inputs) #shape (batch, 75)
output2 = Dense(3 ,activation='tanh',name="layer2" )(output1) #shape (batch, 3)
output3 = Dense(1,name="layer3")(output2)                     #shape (batch, 1)

cross1 = Lambda(lambda x: x[0] * x[1])([output2, output3])    #shape (batch, 3)

model = Model(inputs, cross1)

请注意,这些形状与您的预期完全不同。

我建议您通过自定义层而不是 Lambda 层来完成。为什么?定制会给你更多的自由来做事,而且在查看你想要的重量方面也更加透明。更准确地说,如果您通过 Lambda 层进行操作,恒定权重将不会保存为模型的一部分,但如果您使用自定义层,它会保存。

这是一个例子

from keras import backend as K
from keras.layers import *
from keras.models import *
import numpy as np 


class MyLayer(Layer) :
    # see https://keras.io/layers/writing-your-own-keras-layers/
    def __init__(self, 
                 w_vec=None, 
                 allow_training=False,
                 **kwargs) :
        self._w_vec = w_vec
        assert allow_training or (w_vec is not None), \
            "ERROR: non-trainable w_vec must be initialized"
        self.allow_training = allow_training
        super().__init__(**kwargs)
        return
    def build(self, input_shape) :
        batch_size, num_feats = input_shape
        self.w_vec = self.add_weight(shape=(1, num_feats),
                                     name='w_vec',
                                     initializer='uniform', # <- use your own preferred initializer
                                     trainable=self.allow_training,)
        if self._w_vec is not None :
            # predefined w_vec
            assert self._w_vec.shape[1] == num_feats, \
                "ERROR: initial w_vec shape mismatches the input shape"
            # set it to the weight
            self.set_weights([self._w_vec])  # <- set weights to the supplied one
        super().build(input_shape)
        return
    def call(self, x) :
        # Given:
        #   x = [H21, H22, H23]
        #   w_vec = [w1, w2, w3]
        # Step 1: output elem_prod
        #   elem_prod = [H21*w1, H22*w2, H23*w3]
        elem_prod = x * self.w_vec
        # Step 2: output ret
        #   ret = (H21*w1) * (H22*w2) * (H23*w3)
        ret = K.prod(elem_prod, axis=-1, keepdims=True)
        return ret
    def compute_output_shape(self, input_shape) :
        return (input_shape[0], 1)

def make_test_cases(w_vec=None, allow_training=False):
    x = Input(shape=(75,))
    y = Dense(75, activation='tanh', name='fc1')(x)
    y = Dense(3, activation='tanh', name='fc2')(y)
    y = MyLayer(w_vec, allow_training, name='core')(y)
    y = Dense(1, name='fc3')(y)
    net = Model(inputs=x, outputs=y, name='{}-{}'.format( 'randomInit' if w_vec is None else 'assignInit',
                                                          'trainable' if allow_training else 'nontrainable'))
    print(net.name)
    print(net.layers[-2].get_weights()[0])
    print(net.summary())
    return net

你可以运行下面的测试用例看看区别(注意打印出来的第一行和最后一行,分别给你初始值和常量参数个数)

一个。恒定权重,不可训练

m1 = make_test_cases(w_vec=np.arange(3).reshape([1,3]), allow_training=False)

会给你

assignInit-nontrainable [[0. 1. 2.]]
_________________________________________________________________  
Layer (type)                 Output Shape              Param # 

=================================================================  
input_4 (InputLayer)         (None, 75)                0         
_________________________________________________________________  
fc1 (Dense)                  (None, 75)                5700      
_________________________________________________________________  
fc2 (Dense)                  (None, 3)                 228       
_________________________________________________________________  
core (MyLayer)               (None, 1)                 3         
_________________________________________________________________  
fc3 (Dense)                  (None, 1)                 2         
=================================================================  
Total params: 5,933  
Trainable params: 5,930  
Non-trainable params: 3
_________________________________________________________________ 

b。恒定权重,可训练

m2 = make_test_cases(w_vec=np.arange(3).reshape([1,3]), allow_training=True)

会给你

assignInit-trainable    [[0. 1. 2.]]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_5 (InputLayer)         (None, 75)                0         
_________________________________________________________________
fc1 (Dense)                  (None, 75)                5700      
_________________________________________________________________
fc2 (Dense)                  (None, 3)                 228       
_________________________________________________________________
core (MyLayer)               (None, 1)                 3         
_________________________________________________________________
fc3 (Dense)                  (None, 1)                 2         
=================================================================
Total params: 5,933
Trainable params: 5,933
Non-trainable params: 0
_________________________________________________________________

c。随机权重,可训练

m3 = make_test_cases(w_vec=None, allow_training=True)

会给你

randomInit-trainable [[ 0.02650297 -0.02010062 -0.03771694]]
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_6 (InputLayer)         (None, 75)                0         
_________________________________________________________________
fc1 (Dense)                  (None, 75)                5700      
_________________________________________________________________
fc2 (Dense)                  (None, 3)                 228       
_________________________________________________________________
core (MyLayer)               (None, 1)                 3         
_________________________________________________________________
fc3 (Dense)                  (None, 1)                 2         
=================================================================
Total params: 5,933
Trainable params: 5,933
Non-trainable params: 0
_________________________________________________________________

最后的评论

我要说的是,目前尚不清楚哪种情况可以更好地解决您的问题,但尝试所有这三种情况听起来是个不错的计划。