ValueError: Shapes (32, 129) and (32, 1) are incompatible

ValueError: Shapes (32, 129) and (32, 1) are incompatible

我发现很多看似相关的 Whosebug 帖子在对数据拟合神经网络模型时出现相同的错误消息,但其中 none 似乎与我的用例直接相关,即使用 [=12= 进行拟合] 损失函数。我知道我可以通过首先使用 to_categorical() 将目标变量编码为单热形式来使用 caterorical_crossentropy,但是由于目标 类 的数量很大,我会 运行 进入该方法的内存问题,因此稀疏方法是唯一合理的解决方法。

下面我提供了一个示例数据和一个完整的可重现示例。 model.fit(X,y)行报错,报错信息如下:

ValueError: in user code:

    File "...\.venv\lib\site-packages\keras\engine\training.py", line 1021, in train_function  *
        return step_function(self, iterator)
    File "...\.venv\lib\site-packages\keras\engine\training.py", line 1010, in step_function  **
        outputs = model.distribute_strategy.run(run_step, args=(data,))
    File "...\.venv\lib\site-packages\keras\engine\training.py", line 1000, in run_step  **
        outputs = model.train_step(data)
    File "...\.venv\lib\site-packages\keras\engine\training.py", line 864, in train_step
        return self.compute_metrics(x, y, y_pred, sample_weight)
    File "...\.venv\lib\site-packages\keras\engine\training.py", line 957, in compute_metrics
        self.compiled_metrics.update_state(y, y_pred, sample_weight)
    File "...\.venv\lib\site-packages\keras\engine\compile_utils.py", line 459, in update_state
        metric_obj.update_state(y_t, y_p, sample_weight=mask)
    File "...\.venv\lib\site-packages\keras\utils\metrics_utils.py", line 70, in decorated
        update_op = update_state_fn(*args, **kwargs)
    File "...\.venv\lib\site-packages\keras\metrics.py", line 178, in update_state_fn
        return ag_update_state(*args, **kwargs)
    File "...\.venv\lib\site-packages\keras\metrics.py", line 2364, in update_state  **
        label_weights=label_weights)
    File "...\.venv\lib\site-packages\keras\utils\metrics_utils.py", line 619, in update_confusion_matrix_variables
        y_pred.shape.assert_is_compatible_with(y_true.shape)

    ValueError: Shapes (32, 129) and (32, 1) are incompatible

完整代码:

import numpy as np 
import tensorflow as tf
from keras.models import Sequential
from keras.layers import Dense, Dropout, LSTM, Flatten    
from keras.preprocessing.text import Tokenizer


train_data = ['o by no means honest ventidius i gave it freely ever and theres none can truly say he gives if our betters play at that game we must not dare to imitate them faults that are rich are fair'
 'but was not this nigh shore'
 'impairing henry strengthening misproud york the common people swarm like summer flies and whither fly the gnats but to the sun'
 'what while you were there'
 'chill pick your teeth zir come no matter vor your foins'
 'thanks dear isabel' 'come prick me bullcalf till he roar again'
 'go some of you knock at the abbeygate and bid the lady abbess come to me'
 'an twere not as good deed as drink to break the pate on thee i am a very villain'
 'beaufort it is thy sovereign speaks to thee'
 'but say lucetta now we are alone wouldst thou then counsel me to fall in love'
 'for being a bawd for being a bawd'
 'all blest secrets all you unpublishd virtues of the earth spring with my tears'
 'what likelihood' 'o find him']

max_len = 100

# Tokenize
train_data_flattened = " ".join(train_data).split()
sequences = list() 
for i in range(max_len+1, len(train_data_flattened)):
    seq = train_data_flattened[i-max_len-1:i]
    sequences.append(seq)

# Encode
tokenizer = Tokenizer()
tokenizer.fit_on_texts(sequences)
vocab_size = len(tokenizer.word_index)
encoded_sequences = np.array(tokenizer.texts_to_sequences(sequences))
        
X = encoded_sequences[:,:-1]
y = encoded_sequences[:,-1]

def create_nn(input_shape=(100,1), output_shape=None):

    model = Sequential()
    model.add(LSTM(64, input_shape=input_shape, return_sequences=True))
    model.add(Dropout(0.3))
    model.add(Flatten())
    model.add(Dense(output_shape, activation='softmax'))
    
    metrics_list = [
        tf.keras.metrics.AUC(name='auc'),
        # tf.keras.metrics.BinaryAccuracy(name='accuracy'),
        tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'),
        tf.keras.metrics.Precision(name='precision'),
        tf.keras.metrics.Recall(name='recall'),
    ]

    sparse_cat_crossentropy = tf.losses.SparseCategoricalCrossentropy(from_logits=False)

    model.compile(optimizer = 'adam', loss = sparse_cat_crossentropy, metrics = metrics_list)
    return model

model = create_nn(output_shape=vocab_size)
model.fit(X, y)

错误实际上来自您使用的指标。我认为在您的案例中使用损失函数 SparseCategoricalCrossentropy 时使用 AUCPrecisionRecall 指标没有多大意义。这是一个工作示例:

import tensorflow as tf

def create_nn(input_shape=(100,1), output_shape=None):

    model = tf.keras.Sequential()
    model.add(tf.keras.layers.LSTM(64, input_shape=input_shape, return_sequences=False))
    model.add(tf.keras.layers.Dropout(0.3))
    model.add(tf.keras.layers.Flatten())
    model.add(tf.keras.layers.Dense(output_shape, activation='softmax'))
    
    metrics_list = [
        tf.keras.metrics.SparseCategoricalAccuracy(name='accuracy'),
    ]

    sparse_cat_crossentropy = tf.losses.SparseCategoricalCrossentropy(from_logits=False)

    model.compile(optimizer = 'adam', loss = sparse_cat_crossentropy, metrics = metrics_list)
    return model

vocab_size = 129
model = create_nn(output_shape=vocab_size)

X = tf.random.uniform((500, 100, 1), maxval=vocab_size, dtype=tf.int32)
y = tf.random.uniform((500, 1), maxval=vocab_size, dtype=tf.int32)
model.fit(X, y, batch_size=64, epochs=2)
Epoch 1/2
8/8 [==============================] - 4s 23ms/step - loss: 4.9432 - accuracy: 0.0100
Epoch 2/2
8/8 [==============================] - 0s 22ms/step - loss: 4.9149 - accuracy: 0.0100
<keras.callbacks.History at 0x7fd59abb5510>