收到一条错误消息,指出我需要指定 steps_per_epoch
Getting an error stating I need to specify steps_per_epoch
我正在尝试使用 LSTM 单元构建多对一 RNN 来对 Twitter 情绪分析进行分类。在尝试拟合我的模型后,我收到一个值错误提示。我的猜测是,这是由于我对输入进行标记化的方式所致,但我不太确定符号张量的含义:
If your data is in the form of symbolic tensors, you should specify the `steps_per_epoch` argument (instead of the `batch_size` argument because symbolic tensors are expected to produce batches of input data)."
这是什么意思,我该如何补救?
# Tokenize the input
#creates tokenizer
tokenizer = Tokenizer()
#fits the input to the text, ie most common words being closer to 0 and more obscure being father away
tokenizer.fit_on_texts(X_training)
#converts the input to token indices
X_training_tokens = tokenizer.texts_to_sequences(X_training)
#get largest list of words
maxLen = max([len(s.split()) for s in X_data])
#padding so all inputs are the same size
X_train_pad = pad_sequences(X_training_tokens, maxlen = maxLen)
#time to make the embedding matrix
#instantiate embedding matrix of zeroes
embedding_matrix = np.zeros((len(tokenizer.word_index)+1, dims))
#go through each word in the token list
for word, i in tokenizer.word_index.items():
#get the corresponding embedding vector (if it exists)
embedding_vector = embeddings.get(word)
#check if its not none
if embedding_vector is not None:
#add that to the embedding matrix
embedding_matrix[i] = embedding_vector
#Make the model
Model = Sequential()
Model.add(
Embedding(
input_dim = len(tokenizer.word_index) + 1,
output_dim = dims,
weights = [embedding_matrix],
input_length = maxLen,
trainable = False
)
)
Model.add(
LSTM(
units = maxLen,
return_sequences = False
#possibly add dropout
)
)
Model.add(
Dense(
maxLen,
activation = 'relu'
)
)
Model.add(
Dense(
3,
activation = 'softmax'
)
)
Model.compile(
optimizer = 'Adam',
loss = 'categorical_crossentropy',
metrics = ['accuracy']
)
costs = Model.fit(
x = X_train_pad,
y = Y_training,
batch_size = 2048,
epochs = 10
)
原来我的 Y 是符号张量,因为我使用的是 TensorFlow one-hot 函数,我只是将 Keras 用于分类函数并且能够得到一个有效的 NumPy 数组。
我正在尝试使用 LSTM 单元构建多对一 RNN 来对 Twitter 情绪分析进行分类。在尝试拟合我的模型后,我收到一个值错误提示。我的猜测是,这是由于我对输入进行标记化的方式所致,但我不太确定符号张量的含义:
If your data is in the form of symbolic tensors, you should specify the `steps_per_epoch` argument (instead of the `batch_size` argument because symbolic tensors are expected to produce batches of input data)."
这是什么意思,我该如何补救?
# Tokenize the input
#creates tokenizer
tokenizer = Tokenizer()
#fits the input to the text, ie most common words being closer to 0 and more obscure being father away
tokenizer.fit_on_texts(X_training)
#converts the input to token indices
X_training_tokens = tokenizer.texts_to_sequences(X_training)
#get largest list of words
maxLen = max([len(s.split()) for s in X_data])
#padding so all inputs are the same size
X_train_pad = pad_sequences(X_training_tokens, maxlen = maxLen)
#time to make the embedding matrix
#instantiate embedding matrix of zeroes
embedding_matrix = np.zeros((len(tokenizer.word_index)+1, dims))
#go through each word in the token list
for word, i in tokenizer.word_index.items():
#get the corresponding embedding vector (if it exists)
embedding_vector = embeddings.get(word)
#check if its not none
if embedding_vector is not None:
#add that to the embedding matrix
embedding_matrix[i] = embedding_vector
#Make the model
Model = Sequential()
Model.add(
Embedding(
input_dim = len(tokenizer.word_index) + 1,
output_dim = dims,
weights = [embedding_matrix],
input_length = maxLen,
trainable = False
)
)
Model.add(
LSTM(
units = maxLen,
return_sequences = False
#possibly add dropout
)
)
Model.add(
Dense(
maxLen,
activation = 'relu'
)
)
Model.add(
Dense(
3,
activation = 'softmax'
)
)
Model.compile(
optimizer = 'Adam',
loss = 'categorical_crossentropy',
metrics = ['accuracy']
)
costs = Model.fit(
x = X_train_pad,
y = Y_training,
batch_size = 2048,
epochs = 10
)
原来我的 Y 是符号张量,因为我使用的是 TensorFlow one-hot 函数,我只是将 Keras 用于分类函数并且能够得到一个有效的 NumPy 数组。