为什么相同的神经架构适用于 Keras 但不适用于 Tensorflow(叶分类)?
Why the same neural architecture works in Keras but not Tensorflow ( Leaf Classification )?
最近在玩Kaggle中的叶子分类问题。我看过一本笔记本Simple Keras 1D CNN + features split。但是当我尝试用 Tensorflow 构建相同的模型时,它产生的精度非常低并且损失变化很小。这是我的代码:
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.preprocessing import scale,StandardScaler
#preparing data
train=pd.read_csv('E:\DataAnalysis\Kaggle\leaf\train.csv',sep=',')
test=pd.read_csv('E:\DataAnalysis\Kaggle\leaf\test.csv',sep=',')
subexp=pd.read_csv('E:/DataAnalysis/Kaggle/leaf/sample_submission.csv')
x_train=np.asarray(train.drop(['species','id'],axis=1),dtype=np.float32)
x_train=scale(x_train).reshape([990,64,3])
ids=list(subexp)[1:]
spec=np.asarray(train['species'])
y_train=np.asarray([[int(x==ids[i]) for i in range(len(ids))] for x in spec],dtype=np.float32)
drop=0.75
batch_size=16
max_epoch=10
iter_per_epoch=int(990/batch_size)
max_iter=int(max_epoch*iter_per_epoch)
features=192
keep_prob=0.75
#inputs, weights, and biases
x=tf.placeholder(tf.float32,[None,64,3])
y=tf.placeholder(tf.float32,[None,99])
w={
'w1':tf.Variable(tf.truncated_normal([1,3,512],dtype=tf.float32)),
'wd1':tf.Variable(tf.truncated_normal([64*512,2048],dtype=tf.float32)),
'wd2':tf.Variable(tf.truncated_normal([2048,1024],dtype=tf.float32)),
'wd3':tf.Variable(tf.truncated_normal([1024,99],dtype=tf.float32))
}
b={
'b1':tf.Variable(tf.truncated_normal([512],dtype=tf.float32)),
'bd1':tf.Variable(tf.truncated_normal([2048],dtype=tf.float32)),
'bd2':tf.Variable(tf.truncated_normal([1024],dtype=tf.float32)),
'bd3':tf.Variable(tf.truncated_normal([99],dtype=tf.float32))
}
#model.
def conv(x,we,bi):
l1a=tf.nn.relu(tf.nn.conv1d(value=x,filters=we['w1'],stride=1,padding='SAME'))
l1a=tf.reshape(tf.nn.bias_add(l1a,bi['b1']),[-1,64*512])
l1=tf.nn.dropout(l1a,keep_prob=0.4)
l2a=tf.nn.relu(tf.add(tf.matmul(l1,we['wd1']),bi['bd1']))
l3a=tf.nn.relu(tf.add(tf.matmul(l2a,we['wd2']),bi['bd2']))
out=tf.nn.softmax(tf.matmul(l3a,we['wd3']))
return out
#optimizer and accuracy
out=conv(x,w,b)
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=out,targets=y))
train_op=tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
correct_pred = tf.equal(tf.argmax(out, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
#train
with tf.Session() as sess :
sess.run(tf.global_variables_initializer())
step=0
while step<max_iter :
d =(step % iter_per_epoch)*batch_size
batch_x=x_train[d:d+batch_size:1]
batch_y=y_train[d:d+batch_size:1]
sess.run(train_op,feed_dict={x: batch_x,y: batch_y})
if step%10==0:
loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
y: batch_y,})
print("Iter: ", step," loss:",loss, " accuracy:",acc)
step+=1
print('Training finished!')
结果类似于:
Iter: 0 loss: 0.69941 accuracy: 0.0
Iter: 10 loss: 0.69941 accuracy: 0.0
Iter: 20 loss: 0.69941 accuracy: 0.0
Iter: 30 loss: 0.69941 accuracy: 0.0
Iter: 40 loss: 0.69941 accuracy: 0.0
Iter: 50 loss: 0.698778 accuracy: 0.0625
Iter: 60 loss: 0.698778 accuracy: 0.0625
Iter: 70 loss: 0.69941 accuracy: 0.0
Iter: 80 loss: 0.69941 accuracy: 0.0
Iter: 90 loss: 0.69941 accuracy: 0.0
Iter: 100 loss: 0.69941 accuracy: 0.0
Iter: 110 loss: 0.69941 accuracy: 0.0
Iter: 120 loss: 0.69941 accuracy: 0.0
Iter: 130 loss: 0.69941 accuracy: 0.0
Iter: 140 loss: 0.69941 accuracy: 0.0
Iter: 150 loss: 0.69941 accuracy: 0.0
Iter: 160 loss: 0.69941 accuracy: 0.0
Iter: 170 loss: 0.698778 accuracy: 0.0625
......
但是在Keras中使用相同的数据和模型时,确实产生了很好的结果。
代码:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedShuffleSplit
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Convolution1D, Dropout
from keras.optimizers import SGD
from keras.utils import np_utils
model = Sequential()
model.add(Convolution1D(nb_filter=512, filter_length=1, input_shape=(64, 3)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dropout(0.4))
model.add(Dense(2048, activation='relu'))
model.add(Dense(1024, activation='relu'))
model.add(Dense(99))
model.add(Activation('softmax'))
sgd = SGD(lr=0.01, nesterov=True, decay=1e-6, momentum=0.9)
model.compile(loss='categorical_crossentropy',optimizer=sgd,metrics=['accuracy'])
model.fit(x_train, y_train, nb_epoch=5, batch_size=16)
结果:
Epoch 1/5
990/990 [==============================] - 78s - loss: 4.3229 - acc: 0.1404
Epoch 2/5
990/990 [==============================] - 76s - loss: 1.6020 - acc: 0.6384
Epoch 3/5
990/990 [==============================] - 74s - loss: 0.2723 - acc: 0.9384
Epoch 4/5
990/990 [==============================] - 73s - loss: 0.1061 - acc: 0.9758
顺便说一下,keras 使用的是 Tensorflow 后端。有什么建议吗?
这两个模型之间有很多不同之处,您的 TF 模型使用 ADAM,而您的 Keras 模型使用 SGD。学习率也不一样,学习率对模型收敛性影响很大
损失函数也不匹配,Keras 模型使用分类 cross-entropy,而 TF 模型使用 sigmoid cross-entropy 和 logits(通常用于多标签分类)。另外 sigmoid_cross_entropy_with_logits 将 logits 作为输入(实数),并且你给它一个 softmax 函数的输出。
权重初始化也有差异,您使用的是截断正态分布的 TF 模型,而 Keras 默认使用 glorot_uniform 或均匀分布。
这些差异当然是一个模型训练而另一个模型不训练的原因。
最近在玩Kaggle中的叶子分类问题。我看过一本笔记本Simple Keras 1D CNN + features split。但是当我尝试用 Tensorflow 构建相同的模型时,它产生的精度非常低并且损失变化很小。这是我的代码:
import tensorflow as tf
import numpy as np
import pandas as pd
from sklearn.preprocessing import scale,StandardScaler
#preparing data
train=pd.read_csv('E:\DataAnalysis\Kaggle\leaf\train.csv',sep=',')
test=pd.read_csv('E:\DataAnalysis\Kaggle\leaf\test.csv',sep=',')
subexp=pd.read_csv('E:/DataAnalysis/Kaggle/leaf/sample_submission.csv')
x_train=np.asarray(train.drop(['species','id'],axis=1),dtype=np.float32)
x_train=scale(x_train).reshape([990,64,3])
ids=list(subexp)[1:]
spec=np.asarray(train['species'])
y_train=np.asarray([[int(x==ids[i]) for i in range(len(ids))] for x in spec],dtype=np.float32)
drop=0.75
batch_size=16
max_epoch=10
iter_per_epoch=int(990/batch_size)
max_iter=int(max_epoch*iter_per_epoch)
features=192
keep_prob=0.75
#inputs, weights, and biases
x=tf.placeholder(tf.float32,[None,64,3])
y=tf.placeholder(tf.float32,[None,99])
w={
'w1':tf.Variable(tf.truncated_normal([1,3,512],dtype=tf.float32)),
'wd1':tf.Variable(tf.truncated_normal([64*512,2048],dtype=tf.float32)),
'wd2':tf.Variable(tf.truncated_normal([2048,1024],dtype=tf.float32)),
'wd3':tf.Variable(tf.truncated_normal([1024,99],dtype=tf.float32))
}
b={
'b1':tf.Variable(tf.truncated_normal([512],dtype=tf.float32)),
'bd1':tf.Variable(tf.truncated_normal([2048],dtype=tf.float32)),
'bd2':tf.Variable(tf.truncated_normal([1024],dtype=tf.float32)),
'bd3':tf.Variable(tf.truncated_normal([99],dtype=tf.float32))
}
#model.
def conv(x,we,bi):
l1a=tf.nn.relu(tf.nn.conv1d(value=x,filters=we['w1'],stride=1,padding='SAME'))
l1a=tf.reshape(tf.nn.bias_add(l1a,bi['b1']),[-1,64*512])
l1=tf.nn.dropout(l1a,keep_prob=0.4)
l2a=tf.nn.relu(tf.add(tf.matmul(l1,we['wd1']),bi['bd1']))
l3a=tf.nn.relu(tf.add(tf.matmul(l2a,we['wd2']),bi['bd2']))
out=tf.nn.softmax(tf.matmul(l3a,we['wd3']))
return out
#optimizer and accuracy
out=conv(x,w,b)
cost = tf.reduce_mean(tf.nn.sigmoid_cross_entropy_with_logits(logits=out,targets=y))
train_op=tf.train.AdamOptimizer(learning_rate=0.001).minimize(cost)
correct_pred = tf.equal(tf.argmax(out, 1), tf.argmax(y, 1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
#train
with tf.Session() as sess :
sess.run(tf.global_variables_initializer())
step=0
while step<max_iter :
d =(step % iter_per_epoch)*batch_size
batch_x=x_train[d:d+batch_size:1]
batch_y=y_train[d:d+batch_size:1]
sess.run(train_op,feed_dict={x: batch_x,y: batch_y})
if step%10==0:
loss, acc = sess.run([cost, accuracy], feed_dict={x: batch_x,
y: batch_y,})
print("Iter: ", step," loss:",loss, " accuracy:",acc)
step+=1
print('Training finished!')
结果类似于:
Iter: 0 loss: 0.69941 accuracy: 0.0
Iter: 10 loss: 0.69941 accuracy: 0.0
Iter: 20 loss: 0.69941 accuracy: 0.0
Iter: 30 loss: 0.69941 accuracy: 0.0
Iter: 40 loss: 0.69941 accuracy: 0.0
Iter: 50 loss: 0.698778 accuracy: 0.0625
Iter: 60 loss: 0.698778 accuracy: 0.0625
Iter: 70 loss: 0.69941 accuracy: 0.0
Iter: 80 loss: 0.69941 accuracy: 0.0
Iter: 90 loss: 0.69941 accuracy: 0.0
Iter: 100 loss: 0.69941 accuracy: 0.0
Iter: 110 loss: 0.69941 accuracy: 0.0
Iter: 120 loss: 0.69941 accuracy: 0.0
Iter: 130 loss: 0.69941 accuracy: 0.0
Iter: 140 loss: 0.69941 accuracy: 0.0
Iter: 150 loss: 0.69941 accuracy: 0.0
Iter: 160 loss: 0.69941 accuracy: 0.0
Iter: 170 loss: 0.698778 accuracy: 0.0625
......
但是在Keras中使用相同的数据和模型时,确实产生了很好的结果。 代码:
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import StratifiedShuffleSplit
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten, Convolution1D, Dropout
from keras.optimizers import SGD
from keras.utils import np_utils
model = Sequential()
model.add(Convolution1D(nb_filter=512, filter_length=1, input_shape=(64, 3)))
model.add(Activation('relu'))
model.add(Flatten())
model.add(Dropout(0.4))
model.add(Dense(2048, activation='relu'))
model.add(Dense(1024, activation='relu'))
model.add(Dense(99))
model.add(Activation('softmax'))
sgd = SGD(lr=0.01, nesterov=True, decay=1e-6, momentum=0.9)
model.compile(loss='categorical_crossentropy',optimizer=sgd,metrics=['accuracy'])
model.fit(x_train, y_train, nb_epoch=5, batch_size=16)
结果:
Epoch 1/5
990/990 [==============================] - 78s - loss: 4.3229 - acc: 0.1404
Epoch 2/5
990/990 [==============================] - 76s - loss: 1.6020 - acc: 0.6384
Epoch 3/5
990/990 [==============================] - 74s - loss: 0.2723 - acc: 0.9384
Epoch 4/5
990/990 [==============================] - 73s - loss: 0.1061 - acc: 0.9758
顺便说一下,keras 使用的是 Tensorflow 后端。有什么建议吗?
这两个模型之间有很多不同之处,您的 TF 模型使用 ADAM,而您的 Keras 模型使用 SGD。学习率也不一样,学习率对模型收敛性影响很大
损失函数也不匹配,Keras 模型使用分类 cross-entropy,而 TF 模型使用 sigmoid cross-entropy 和 logits(通常用于多标签分类)。另外 sigmoid_cross_entropy_with_logits 将 logits 作为输入(实数),并且你给它一个 softmax 函数的输出。
权重初始化也有差异,您使用的是截断正态分布的 TF 模型,而 Keras 默认使用 glorot_uniform 或均匀分布。
这些差异当然是一个模型训练而另一个模型不训练的原因。