测试集的准确性不会增加

Accuracy on test set does not increase

我正在处理一个图像数据集,作为学习的机会,我从头开始对多项式进行编码。

我尝试了多种不同的批量大小(50、100、200)和学习率(.001、.05、.1、.5)。

我似乎仍然不能超过 1%,所以我想知道我的代码中是否遗漏了什么,或者这是否是我可以通过浅层学习方法获得的最佳结果。我尝试使用 sklearn 进行逻辑回归,结果可以达到 7% 左右(我知道,这太糟糕了!),并试图在 tensorflow 中为其重新创建代码。

有没有办法判断它是否真的在改善?任何帮助深表感谢!谢谢

import numpy as np
import tensorflow as tf 
from keras.datasets import cifar100

(x_train, y_train), (x_test, y_test) = cifar100.load_data()


x_train_data= np.zeros((50000, 32, 32))
for i, x in enumerate(x_train):
    x_train_data[i] = rgb2gray(x)


x_test_data= np.zeros((10000, 32, 32))
for i, x in enumerate(x_test):
    x_test_data[i] = rgb2gray(x)

#convert data to a vector
x_train_data = x_train_data.reshape((50000, -1))
x_test_data = x_test_data.reshape((10000, -1))


NUM_CLASSES = 100
X_DIM = 32
Y_DIM = 32
PIXELS_PER_SAMPLE = X_DIM*Y_DIM

#create placeholders
X =  tf.placeholder(tf.float32, [None, PIXELS_PER_SAMPLE])
Y = tf.placeholder(tf.float32, [None, NUM_CLASSES])


#create variables
with tf.variable_scope("multi_class_logistic_model", reuse=tf.AUTO_REUSE):
    W = tf.get_variable('Weight_matrix', initializer = tf.random_normal(shape = (X_DIM*Y_DIM, NUM_CLASSES)))    
    W_o= tf.get_variable('bias', initializer = tf.random_normal(shape = [NUM_CLASSES]))
    Y_pred = tf.matmul(X, W)  + W_o


#convert values to probability vector using softmax
Y_pred_prob = tf.nn.softmax(logits=Y_pred)

#create loss function (cross entropy)
loss = -tf.reduce_mean(Y * tf.log(Y_pred_prob))

#create accuracy measurement
accuracy = tf.reduce_mean(tf.cast(tf.equal(tf.argmax(Y_pred,1),tf.argmax(Y,1)),tf.float32))

#create optimizer
opt = tf.train.GradientDescentOptimizer(learning_rate=0.001).minimize(loss)


BATCH_SIZE = 100
NUM_EPOCHS = 10000

#function to batch data. One hot encodes the labels
def next_batch(num, data, labels):
    '''
    Return a total of `num` random samples and labels. 
    '''
    idx = np.arange(0 , len(data))
    np.random.shuffle(idx)
    idx = idx[:num]
    data_shuffle = [data[ i] for i in idx]
    labels_shuffle = [labels[ i] for i in idx]

    onehot_encoded = list()
    for value in labels_shuffle:
        letter = [0 for _ in range(100)]
        letter[value] = 1
        onehot_encoded.append(letter)

    return np.asarray(data_shuffle), np.asarray(onehot_encoded)


#one hot encode the labels test set
y_test_onehot_encoded = list()
for value in y_test.ravel():
    letter = [0 for _ in range(100)]
    letter[value] = 1
    y_test_onehot_encoded.append(letter)
y_test_onehot_encoded_array = np.array(y_test_onehot_encoded)


#run tf session
train_losses, val_losses = [], []
train_accuracies, val_accuracies = [], []
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    for eidx in range(NUM_EPOCHS):
        epoch_acc, epoch_loss = [], []
        for bidx in range(x_train_data.shape[0]// BATCH_SIZE):
            xs, ys = next_batch(BATCH_SIZE, x_train_data, y_train.ravel())
            xs = xs.astype(np.float32)
            _, train_loss, train_acc= sess.run([opt,loss,accuracy], feed_dict={X: xs,Y: ys})
            if (bidx+1)%100 == 0: # print result every 100 batch
                print('epoch {} training batch {} loss {} accu {}'.format(eidx +1 , bidx +1, train_loss, train_acc))
            epoch_acc.append(train_acc)
            epoch_loss.append(train_loss)
        print('##################################')
        val_acc, val_loss = sess.run([accuracy, loss],
            feed_dict= {X:x_test_data, Y: y_test_onehot_encoded_array})
        print('epoch {} # test accuracy {} $ test loss {}'.format(eidx +1, val_acc, val_loss ))
        print('##################################') 
        # Let keep epoch level values for plotting
        train_losses.append(np.mean(epoch_loss))
        train_accuracies.append(np.mean(epoch_acc))
        val_losses.append(val_loss)
        val_accuracies.append(val_acc)

我每个时期的输出:

epoch 1679 training batch 100 loss nan accu 0.009999999776482582
epoch 1679 training batch 200 loss nan accu 0.0
epoch 1679 training batch 300 loss nan accu 0.019999999552965164
epoch 1679 training batch 400 loss nan accu 0.0
epoch 1679 training batch 500 loss nan accu 0.009999999776482582
##################################
epoch 1679 # test accuracy 0.009999999776482582 $ test loss nan

你的损失将达到 'nan',这是因为你的损失函数不稳健,即当 Y_pred_prob 为零时它会达到 -inf。 您可以这样更改它:

#create loss function (cross entropy)
epsilon = 1e-16
loss = -tf.reduce_mean(Y * tf.log(Y_pred_prob + epsilon))

应该可以!