将变压器输出连接到 Keras 中的 CNN 输入时出现问题

Problem connecting transformer output to CNN input in Keras

我需要按照编码器-解码器方法在 Tensorflow 中构建基于转换器的架构,其中编码器是预先存在的 Huggingface Distilbert 模型,解码器是 CNN。

输入:包含连续几个短语的文本的文本。输出:根据分类标准的代码。我的数据文件有 7387 对 TSV 格式的文本标签:

text \t code
This is example text number one. It might contain some other phrases. \t C21
This is example text number two. It might contain some other phrases. \t J45.1
This is example text number three. It might contain some other phrases. \t A27

代码的其余部分是这样的:

        text_file = "data/datafile.tsv"
        with open(text_file) as f:
                lines = f.read().split("\n")[:-1]
                text_and_code_pairs = []
                for line in lines:
                        text, code = line.split("\t")
                        text_and_code_pairs.append((text, code))


        random.shuffle(text_and_code_pairs)
        num_val_samples = int(0.10 * len(text_and_code_pairs))
        num_train_samples = len(text_and_code_pairs) - 3 * num_val_samples
        train_pairs = text_and_code_pairs[:num_train_samples]
        val_pairs = text_and_code_pairs[num_train_samples : num_train_samples + num_val_samples]
        test_pairs = text_and_code_pairs[num_train_samples + num_val_samples :]

        train_texts = [fst for (fst,snd) in train_pairs]
        train_labels = [snd for (fst,snd) in train_pairs]
        val_texts = [fst for (fst,snd) in val_pairs]
        val_labels = [snd for (fst,snd) in val_pairs]
        test_texts = [fst for (fst,snd) in test_pairs]
        test_labels = [snd for (fst,snd) in test_pairs]

        distilbert_encoder = TFDistilBertModel.from_pretrained("distilbert-base-multilingual-cased")
        tokenizer = DistilBertTokenizerFast.from_pretrained("distilbert-base-multilingual-cased")

        train_encodings = tokenizer(train_texts, truncation=True, padding=True)
        val_encodings = tokenizer(val_texts, truncation=True, padding=True)
        test_encodings = tokenizer(test_texts, truncation=True, padding=True)

        train_dataset = tf.data.Dataset.from_tensor_slices((
                dict(train_encodings),
                train_labels
        ))
        val_dataset = tf.data.Dataset.from_tensor_slices((
                dict(val_encodings),
                val_labels
        ))
        test_dataset = tf.data.Dataset.from_tensor_slices((
                dict(test_encodings),
                test_labels
        ))

        model = build_model(distilbert_encoder)
        model.fit(train_dataset.batch(64), validation_data=val_dataset, epochs=3, batch_size=64)
        model.predict(test_dataset, verbose=1)

最后,build_model 函数:

def build_model(transformer, max_len=512):
        model = tf.keras.models.Sequential()
        # Encoder
        inputs = layers.Input(shape=(max_len,), dtype=tf.int32)
        distilbert = transformer(inputs)
        # LAYER - something missing here?
        # Decoder
        conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert)
        pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)
        flat = tf.keras.layers.Flatten()(pooling)
        fc = tf.keras.layers.Dense(1255, activation='relu')(flat)
        softmax = tf.keras.layers.Dense(1255, activation='softmax')(fc)
        model = tf.keras.models.Model(inputs = inputs, outputs = softmax)
        model.compile(tf.keras.optimizers.Adam(learning_rate=5e-5), loss="categorical_crossentropy", metrics=['accuracy'])
        print(model.summary())
        return model

我设法缩小了可能出现问题的位置。从顺序 Keras 更改为函数式 Keras API 后,出现以下错误:

Traceback (most recent call last):
  File "keras_transformer.py", line 99, in <module>
    main()
  File "keras_transformer.py", line 94, in main
    model = build_model(distilbert_encoder)
  File "keras_transformer.py", line 23, in build_model
    conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert)
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 897, in __call__
    self._maybe_build(inputs)
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 2416, in _maybe_build
    self.build(input_shapes)  # pylint:disable=not-callable
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/keras/layers/convolutional.py", line 152, in build
    input_shape = tensor_shape.TensorShape(input_shape)
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 771, in __init__
    self._dims = [as_dimension(d) for d in dims_iter]
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 771, in <listcomp>
    self._dims = [as_dimension(d) for d in dims_iter]
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 716, in as_dimension
    return Dimension(value)
  File "/home/users/user/.local/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 200, in __init__
    None)
  File "<string>", line 3, in raise_from
TypeError: Dimension value must be integer or None or have an __index__ method, got 'last_hidden_state'

看来错误出在transformer的输出和卷积层的输入之间的连接上。我是否应该在它们之间包括另一层以适应变压器的输出?如果是这样,最好的选择是什么?我正在使用 tensorflow==2.2.0,transformers==4.5.1 和 Python 3.6.9

我认为你是对的。问题似乎出在Conv1D层的输入上。

根据 the documentation outputs.last_hidden_state 的形状为 (batch_size, sequence_length, hidden_size).
Conv1D 期望输入形状为 (batch_size, sequence_length).
也许您可以通过将 Conv1D 更改为 Conv2D 或在两者之间添加一个 Conv2D 层来解决问题。

我认为问题是在 dilbert 实例之后为 tensorflow 层调用正确的张量。因为 distilbert = transformer(inputs) returns 是一个实例而不是 tensorflow 中的张量,例如 pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)poolingMaxPooling1D 层的输出张量。

我通过调用 distilbert 实例的 last_hidden_state 变量解决了你的问题(即 dilbert 模型的输出),这将是你对下一个 Conv1D层.

import os
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3' # suppress Tensorflow messages

from transformers import TFDistilBertModel, DistilBertModel
import tensorflow as tf

distilbert_encoder = TFDistilBertModel.from_pretrained("distilbert-base-multilingual-cased")


def build_model(transformer, max_len=512):
        # model = tf.keras.models.Sequential()
        # Encoder
        inputs = tf.keras.layers.Input(shape=(max_len,), dtype=tf.int32)
        distilbert = transformer(inputs)
        # Decoder
        ###### !!!!!! #########
        conv1D = tf.keras.layers.Conv1D(filters=5, kernel_size=10)(distilbert.last_hidden_state) 
        ###### !!!!!! #########        
        pooling = tf.keras.layers.MaxPooling1D(pool_size=2)(conv1D)
        flat = tf.keras.layers.Flatten()(pooling)
        fc = tf.keras.layers.Dense(1255, activation='relu')(flat)
        softmax = tf.keras.layers.Dense(1255, activation='softmax')(fc)
        model = tf.keras.models.Model(inputs = inputs, outputs = softmax)
        model.compile(tf.keras.optimizers.Adam(learning_rate=5e-5), loss="categorical_crossentropy", metrics=['accuracy'])
        print(model.summary())
        return model


model = build_model(distilbert_encoder)

这个returns,

Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 512)]             0         
_________________________________________________________________
tf_distil_bert_model (TFDist TFBaseModelOutput(last_hi 134734080 
_________________________________________________________________
conv1d (Conv1D)              (None, 503, 5)            38405     
_________________________________________________________________
max_pooling1d (MaxPooling1D) (None, 251, 5)            0         
_________________________________________________________________
flatten (Flatten)            (None, 1255)              0         
_________________________________________________________________
dense (Dense)                (None, 1255)              1576280   
_________________________________________________________________
dense_1 (Dense)              (None, 1255)              1576280   
=================================================================
Total params: 137,925,045
Trainable params: 137,925,045
Non-trainable params: 0

注意:我假设您在 build_model 函数中用 layers.Input 表示 tf.keras.layers.Input