在 biLSTM 之上使用 tfa.layers.crf

using tfa.layers.crf on top of biLSTM

我正在尝试使用 tensorflow-addons 库实现基于 CRF 的 NER 模型。该模型获取 word 中的单词序列以索引和 char 级别格式,并将它们连接起来并将它们提供给 BiLSTM 层。下面是实现代码:

import tensorflow as tf
from tensorflow.keras import Model, Input
from tensorflow.keras.layers import LSTM, Embedding, Dense, TimeDistributed, Dropout, Conv1D
from tensorflow.keras.layers import Bidirectional, concatenate, SpatialDropout1D, GlobalMaxPooling1D
from tensorflow_addons.layers import CRF

word_input = Input(shape=(max_sent_len,))
word_emb = Embedding(input_dim=n_words + 2, output_dim=dim_word_emb,
                     input_length=max_sent_len, mask_zero=True)(word_input)

char_input = Input(shape=(max_sent_len, max_word_len,))
char_emb = TimeDistributed(Embedding(input_dim=n_chars + 2, output_dim=dim_char_emb,
                           input_length=max_word_len, mask_zero=True))(char_input)

char_emb = TimeDistributed(LSTM(units=20, return_sequences=False,
                                recurrent_dropout=0.5))(char_emb)

# main LSTM
main_input = concatenate([word_emb, char_emb])
main_input = SpatialDropout1D(0.3)(main_input)
main_lstm = Bidirectional(LSTM(units=50, return_sequences=True,
                               recurrent_dropout=0.6))(main_input)
kernel = TimeDistributed(Dense(50, activation="relu"))(main_lstm)  
crf = CRF(n_tags+1)  # CRF layer
decoded_sequence, potentials, sequence_length, chain_kernel = crf(kernel)  # output

model = Model([word_input, char_input], potentials)
model.add_loss(tf.abs(tf.reduce_mean(kernel)))
model.compile(optimizer="rmsprop", loss='categorical_crossentropy')

当我开始拟合模型时,我收到以下警告:

WARNING:tensorflow:Gradients do not exist for variables ['chain_kernel:0'] when minimizing the loss.
WARNING:tensorflow:Gradients do not exist for variables ['chain_kernel:0'] when minimizing the loss.

训练过程是这样的:

438/438 [==============================] - 80s 163ms/step - loss: nan - val_loss: nan
Epoch 2/10
438/438 [==============================] - 71s 163ms/step - loss: nan - val_loss: nan
Epoch 3/10
438/438 [==============================] - 71s 162ms/step - loss: nan - val_loss: nan
Epoch 4/10
438/438 [==============================] - 71s 161ms/step - loss: nan - val_loss: nan
Epoch 5/10
438/438 [==============================] - 71s 162ms/step - loss: nan - val_loss: nan
Epoch 6/10
438/438 [==============================] - 70s 160ms/step - loss: nan - val_loss: nan
Epoch 7/10
438/438 [==============================] - 70s 161ms/step - loss: nan - val_loss: nan
Epoch 8/10
438/438 [==============================] - 70s 160ms/step - loss: nan - val_loss: nan
Epoch 9/10
438/438 [==============================] - 71s 161ms/step - loss: nan - val_loss: nan
Epoch 10/10
438/438 [==============================] - 70s 159ms/step - loss: nan - val_loss: nan

我几乎可以肯定问题出在我设置损失函数的方式上,但我不知道我应该如何设置它们。我也搜索了我的问题,但没有得到任何答案。 此外,当我测试我的模型时,它无法正确预测标签并给它们相同的标签。谁能帮我描述一下我该如何解决这个问题?

将你的损失函数改为 tensorflow_addons.losses.SigmoidFocalCrossEntropy()。我想分类交叉熵不是一个好的选择。