训练 SIAMESE 网络时遇到 "No gradients for any variable" 错误
Facing "No gradients for any variable" Error while training a SIAMESE NETWORK
我目前正在 Tensorflow( ver:1.8 os:Ubuntu MATE16.04) 平台上构建模型。
该模型的目的是 detect/match 人体关键点。
训练时,出现错误"No gradients for any variable",我很难修复它。
模特背景:
它的基本思想来自这两篇论文:
- Deep Learning of Binary Hash Codes for fast Image Retrieval
- Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks
他们表明可以根据卷积网络生成的哈希码来匹配图像。
两张图片的相似度由它们对应的哈希码之间的汉明距离决定。
我认为可以开发一个极轻量级的模型来对具有 "constant human subject" 和 "fixed background" 的视频执行实时人体姿势估计。
模型结构
01.Data 来源:
来自一个视频的 3 张图像,具有相同的人物主题和相似的背景。
每张图像中的每个人类关键点都被很好地标记了。
其中 2 张图像将用作 "hint sources",最后一张图像将作为关键点的目标 detection/matching。
02.Hints:
23x23 像素的 ROI 将根据人体关键点的位置从 "hint source" 图像中裁剪。
这些 ROI 的中心是关键点。
03.convolutional网络"for Hints":
一个简单的三层结构。
前两层是 [2,2] 步长与 3x3 过滤器的卷积。
最后一层是在没有填充的5x5输入上进行5x5卷积(等于全连接层)
这会将一个23x23像素的提示ROI变成一个32位的哈希码。
一张hint souce图片会生成一组16个哈希码。
04.Convolutional网络"for target image":
网络与提示网络共享 smae 权重。
但在这种情况下,每个卷积层都有填充。
301x301 像素的图像将变成 76x76 "Hash map"
05.Hash匹配:
我创建了一个名为“locateMin_and_get_loss”的函数来计算"hint hash"和散列图每个点上的散列码之间的汉明距离。
此函数将创建一个 "distance map"。
距离值最小的点的位置将被视为关键点的位置。
06.Loss计算:
我做了一个函数"get_total_loss_and_result"来计算16个关键点的总损失。
损失是地面实况标签点与模型定位点之间的归一化欧式距离。
07.proposed 工作流程:
在初始化此模型之前,用户将从不同角度拍摄目标人物主体的两张照片。
这些图片将被最先进的模型(如 OpenPose 或 DeepPose)标记,并使用 03 中提到的卷积网络从中生成提示哈希。
最终视频流将由模型启动和处理。
08.Why "Two" 组提示?
一个人joint/keypoint从不同的角度观察会有非常不同的外观。
我不想增加神经网络的维数,而是想 "cheat the game" 通过收集两个提示而不是一个提示。
想知道能不能提高模型的精度和泛化能力
我遇到的问题:
01.The"No gradients for any variable "错误
(我的主要问题post):
如上所述,我在训练模型时遇到了这个错误。
我试着向 post 学习 and this and this。
但是目前我查了计算图也没有头绪
02.The"Batch"问题:
由于其独特的结构,很难使用常规的占位符来包含多个批次的输入数据。
我通过将批号设置为 3 并手动组合损失函数的值来修复它。
2018.10.28 Edit:
只有一组提示的简化版:
import tensorflow as tf
import numpy as np
import time
from imageLoader import getPaddedROI,training_data_feeder
import math
'''
created by Cid Zhang
a sub-model for human pose estimation
'''
def truncated_normal_var(name,shape,dtype):
return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.truncated_normal_initializer(stddev=0.01)))
def zero_var(name,shape,dtype):
return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.constant_initializer(0.0)))
roi_size = 23
image_input_size = 301
#input placeholders
#batch1 hints
inputs_b1h1 = tf.placeholder(tf.float32, ( 16, roi_size, roi_size, 3), name='inputs_b1h1')
inputs_s = tf.placeholder(tf.float32, (None, image_input_size, image_input_size, 3), name='inputs_s')
labels = tf.placeholder(tf.float32,(16,76,76), name='labels')
#define the model
def paraNet(input):
out_l1 = tf.layers.conv2d(input, 8, [3, 3],strides=(2, 2), padding ='valid' ,name='para_conv_1')
out_l1 = tf.nn.relu6(out_l1)
out_l2 = tf.layers.conv2d(out_l1, 16, [3, 3],strides=(2, 2), padding ='valid' ,name='para_conv_2')
out_l2 = tf.nn.relu6(out_l2)
out_l3 = tf.layers.conv2d(out_l2, 32, [5, 5],strides=(1, 1), padding ='valid' ,name='para_conv_3')
return out_l3
#network pipeline to create the first Hint Hash Sets (Three batches)
with tf.variable_scope('conv'):
out_b1h1_l3 = paraNet(inputs_b1h1)
#flatten and binerize the hashs
out_b1h1_l3 =tf.squeeze( tf.round(tf.nn.sigmoid(out_b1h1_l3)) )
with tf.variable_scope('conv', reuse=True):
out_2_l1 = tf.layers.conv2d(inputs_s, 8, [3, 3],strides=(2, 2), padding ='same' ,name='para_conv_1')
out_2_l1 = tf.nn.relu6(out_2_l1)
out_2_l2 = tf.layers.conv2d(out_2_l1, 16, [3, 3],strides=(2, 2), padding ='same' ,name='para_conv_2')
out_2_l2 = tf.nn.relu6(out_2_l2)
out_2_l3 = tf.layers.conv2d(out_2_l2, 32, [5, 5],strides=(1, 1), padding ='same' ,name='para_conv_3')
#binerize the value into Hash code
out_2_l3 = tf.round(tf.nn.sigmoid(out_2_l3))
orig_feature_map_size = tf.shape(out_2_l3)[1]
#calculate Hamming distance maps
map0 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[0] , out_2_l3 ) ) , axis=3 )
map1 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[1] , out_2_l3 ) ) , axis=3 )
map2 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[2] , out_2_l3 ) ) , axis=3 )
map3 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[3] , out_2_l3 ) ) , axis=3 )
map4 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[4] , out_2_l3 ) ) , axis=3 )
map5 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[5] , out_2_l3 ) ) , axis=3 )
map6 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[6] , out_2_l3 ) ) , axis=3 )
map7 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[7] , out_2_l3 ) ) , axis=3 )
map8 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[8] , out_2_l3 ) ) , axis=3 )
map9 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[9] , out_2_l3 ) ) , axis=3 )
map10 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[10] , out_2_l3 ) ) , axis=3 )
map11 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[11] , out_2_l3 ) ) , axis=3 )
map12 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[12] , out_2_l3 ) ) , axis=3 )
map13 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[13] , out_2_l3 ) ) , axis=3 )
map14 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[14] , out_2_l3 ) ) , axis=3 )
map15 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[15] , out_2_l3 ) ) , axis=3 )
totoal_map =tf.div( tf.concat([map0, map1, map2, map3, map4, map5, map6, map7,
map8, map9, map10,map11,map12, map13, map14, map15], 0) , 32)
loss = tf.nn.l2_loss(totoal_map - labels , name = 'loss' )
#ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss )
init = tf.global_variables_initializer()
batchsize = 3
with tf.Session() as sess:
#writer = tf.summary.FileWriter("./variable_graph",graph = sess.graph)
sess.run(init)
#load image from dataset(train set)
joint_data_path = "./custom_data.json"
train_val_path = "./train_val_indices.json"
imgpath = "./000/"
input_size = 301
hint_roi_size = 23
hintSet01_norm_batch = []
hintSet02_norm_batch = []
t_img_batch = []
t_label_norm_batch = []
#load data
hintSet01,hintSet02,t_img,t_label_norm = training_data_feeder(joint_data_path, train_val_path, imgpath, input_size, hint_roi_size )
#Normalize the image pixel values to 0~1
hintSet01_norm = []
hintSet02_norm = []
t_img = np.float32(t_img /255.0)
for rois in hintSet01:
tmp = np.float32(rois / 255.0)
hintSet01_norm.append(tmp.tolist())
for rois in hintSet02:
tmp = np.float32(rois / 255.0)
hintSet02_norm.append(tmp.tolist())
print(tf.trainable_variables())
temp = sess.run(totoal_map , feed_dict={inputs_s: [t_img] ,
inputs_b1h1: hintSet01_norm,
labels: t_label_norm
})
print(temp)
print(np.shape(temp))
代码:https://github.com/gitpharm01/Parapose/blob/master/paraposeNetworkV3.py
数据集:
它是从 mpii 数据集生成的自定义数据集。
它有 223 个图像簇。
每个集群都有一个固定的人物,姿势各异,背景保持不变。
一个集群至少有 3 张图片。大约627MB,我会尽量打包稍后上传。
2018.10.26 Edit:
你可以在GoogleDrive上下载,整个数据集分为9个部分。(我不能post这篇文章中的链接超过8个。链接在这个文件中:
https://github.com/gitpharm01/Parapose/blob/master/000/readme.md
我使用 https://www.tensorflow.org/guide/eager 中描述的 "eager execution" 来检查梯度。
最后我发现 "tf.round" 和 "tf.nn.relu6" 会擦除或将渐变设置为零。
我对代码做了一些修改,现在可以进入训练阶段了:
import tensorflow as tf
import numpy as np
import time
from imageLoader import getPaddedROI,training_data_feeder
import math
import cv2
'''
created by Cid Zhang
a sub-model for human pose estimation
'''
tf.reset_default_graph()
def truncated_normal_var(name,shape,dtype):
return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.truncated_normal_initializer(stddev=0.01)))
def zero_var(name,shape,dtype):
return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.constant_initializer(0.0)))
roi_size = 23
image_input_size = 301
#input placeholders
#batch1 hints
inputs_b1h1 = tf.placeholder(tf.float32, ( 16, roi_size, roi_size, 3), name='inputs_b1h1')
#inputs_b1h2 = tf.placeholder(tf.float32, ( 16, roi_size, roi_size, 3), name='inputs_b1h2')
inputs_s = tf.placeholder(tf.float32, (None, image_input_size, image_input_size, 3), name='inputs_s')
labels = tf.placeholder(tf.float32,(16,76,76), name='labels')
#define the model
def paraNet(inputs, inputs_s):
with tf.variable_scope('conv'):
out_l1 = tf.layers.conv2d(inputs, 16, [3, 3],strides=(2, 2), padding ='valid' ,name='para_conv_1')
out_l1r = tf.nn.relu(out_l1)
out_l2 = tf.layers.conv2d(out_l1r, 48, [3, 3],strides=(2, 2), padding ='valid' ,name='para_conv_2')
out_l2r = tf.nn.relu(out_l2)
out_l3 = tf.layers.conv2d(out_l2r, 96, [5, 5],strides=(1, 1), padding ='valid' ,name='para_conv_3')
out_l3r = tf.nn.relu(out_l3)
out_l4 = tf.layers.conv2d(out_l3r, 32, [1, 1],strides=(1, 1), padding ='valid' ,name='para_conv_4')
out_l4r = tf.squeeze( tf.sign( tf.sigmoid(out_l4) ) )
with tf.variable_scope('conv', reuse=True):
out_2_l1 = tf.layers.conv2d(inputs_s, 16, [3, 3],strides=(2, 2), padding ='same' ,name='para_conv_1')
out_2_l1r = tf.nn.relu(out_2_l1)
out_2_l2 = tf.layers.conv2d(out_2_l1r, 48, [3, 3],strides=(2, 2), padding ='same' ,name='para_conv_2')
out_2_l2r = tf.nn.relu(out_2_l2)
out_2_l3 = tf.layers.conv2d(out_2_l2r, 96, [5, 5],strides=(1, 1), padding ='same' ,name='para_conv_3')
out_2_l3r = tf.nn.relu(out_2_l3)
out_2_l4 = tf.layers.conv2d(out_2_l3r, 32, [1, 1],strides=(1, 1), padding ='same' ,name='para_conv_4')
out_2_l4r =tf.sign( tf.sigmoid(out_2_l4))
return out_l4r , out_2_l4r
def lossFunc(inputs_hint, inputs_sample, labels):
hint, sample = paraNet(inputs_hint, inputs_sample)
map0 = tf.reduce_sum ( tf.abs (tf.subtract( hint[0] , sample ) ) , axis=3 )
map1 = tf.reduce_sum ( tf.abs (tf.subtract( hint[1] , sample ) ) , axis=3 )
map2 = tf.reduce_sum ( tf.abs (tf.subtract( hint[2] , sample ) ) , axis=3 )
map3 = tf.reduce_sum ( tf.abs (tf.subtract( hint[3] , sample ) ) , axis=3 )
map4 = tf.reduce_sum ( tf.abs (tf.subtract( hint[4] , sample ) ) , axis=3 )
map5 = tf.reduce_sum ( tf.abs (tf.subtract( hint[5] , sample ) ) , axis=3 )
map6 = tf.reduce_sum ( tf.abs (tf.subtract( hint[6] , sample ) ) , axis=3 )
map7 = tf.reduce_sum ( tf.abs (tf.subtract( hint[7] , sample ) ) , axis=3 )
map8 = tf.reduce_sum ( tf.abs (tf.subtract( hint[8] , sample ) ) , axis=3 )
map9 = tf.reduce_sum ( tf.abs (tf.subtract( hint[9] , sample ) ) , axis=3 )
map10 = tf.reduce_sum ( tf.abs (tf.subtract( hint[10] , sample ) ) , axis=3 )
map11 = tf.reduce_sum ( tf.abs (tf.subtract( hint[11] , sample ) ) , axis=3 )
map12 = tf.reduce_sum ( tf.abs (tf.subtract( hint[12] , sample ) ) , axis=3 )
map13 = tf.reduce_sum ( tf.abs (tf.subtract( hint[13] , sample ) ) , axis=3 )
map14 = tf.reduce_sum ( tf.abs (tf.subtract( hint[14] , sample ) ) , axis=3 )
map15 = tf.reduce_sum ( tf.abs (tf.subtract( hint[15] , sample ) ) , axis=3 )
totoal_map =tf.div( tf.concat([map0, map1, map2, map3, map4, map5, map6, map7,
map8, map9, map10,map11,map12, map13, map14, map15], 0) , 64)
loss = tf.nn.l2_loss( totoal_map - labels , name = 'loss' )
return loss, totoal_map
loss, totoal_map = lossFunc(inputs_b1h1, inputs_s, labels)
train_step = tf.train.GradientDescentOptimizer(2.0).minimize(loss)
#init = tf.global_variables_initializer()
saver = tf.train.Saver()
with tf.Session() as sess:
#writer = tf.summary.FileWriter("./variable_graph",graph = sess.graph)
#sess.run(init)
#load image from dataset(train set)
joint_data_path = "./custom_data.json"
train_val_path = "./train_val_indices.json"
imgpath = "./000/"
input_size = 301
hint_roi_size = 23
'''
#load data
hintSet01,hintSet02,t_img,t_label_norm = training_data_feeder(joint_data_path, train_val_path, imgpath, input_size, hint_roi_size )
#Normalize the image pixel values to 0~1
hintSet01_norm = []
hintSet02_norm = []
t_img =[ np.float32(t_img /255.0) ]
#print(type(t_img))
#print(np.shape(t_img))
#print(type(t_label_norm))
for rois in hintSet01:
tmp = np.float32(rois / 255.0)
hintSet01_norm.append(tmp.tolist())
for rois in hintSet02:
tmp = np.float32(rois / 255.0)
hintSet02_norm.append(tmp.tolist())
loss_value , total_map_value = sess.run ([loss, totoal_map], feed_dict = {inputs_s: t_img,
inputs_b1h1: hintSet01_norm,
labels: t_label_norm
})
print("-----loss value:",loss_value)
print("-----total_map_value:", total_map_value[0,0] )
print("-----label_value", t_label_norm[0,0] )
#cv2.imshow("t_img",t_img[0])
#for img in t_label_norm:
# print(img)
# cv2.imshow("hint", img)
# cv2.waitKey(0)
#print(tf.trainable_variables())
#print(hash_set01)
#print(out_2_l3)
'''
saver.restore(sess, "./temp_model/model4.ckpt")
for i in range(1000):
#load data
hintSet01,hintSet02,t_img,t_label_norm = training_data_feeder(joint_data_path, train_val_path, imgpath, input_size, hint_roi_size )
#Normalize the image pixel values to 0~1
hintSet01_norm = []
hintSet02_norm = []
t_img =[ np.float32(t_img /255.0) ]
#print(type(t_img))
#print(np.shape(t_img))
#print(type(t_label_norm))
for rois in hintSet01:
tmp = np.float32(rois / 255.0)
hintSet01_norm.append(tmp.tolist())
for rois in hintSet02:
tmp = np.float32(rois / 255.0)
hintSet02_norm.append(tmp.tolist())
loss_val, _ = sess.run([loss, train_step] ,
feed_dict = {inputs_s: t_img,
inputs_b1h1: hintSet01_norm,
labels: t_label_norm })
if i % 50 == 0:
print(loss_val)
save_path = saver.save(sess, "./temp_model/model" + '5' + ".ckpt")
#print(temp)
#print(np.shape(temp))
但不幸的是,损失值在训练过程中并没有减少。
我认为代码中仍然存在一些错误。
无论我设置多长时间的迭代,保存的检查点文件总是命名为"XXXX.ckpt.data-00000-of-00001"。
我再写一篇post,因为这个post的主要问题已经解决了。
我目前正在 Tensorflow( ver:1.8 os:Ubuntu MATE16.04) 平台上构建模型。 该模型的目的是 detect/match 人体关键点。 训练时,出现错误"No gradients for any variable",我很难修复它。
模特背景: 它的基本思想来自这两篇论文:
- Deep Learning of Binary Hash Codes for fast Image Retrieval
- Learning Compact Binary Descriptors with Unsupervised Deep Neural Networks
他们表明可以根据卷积网络生成的哈希码来匹配图像。 两张图片的相似度由它们对应的哈希码之间的汉明距离决定。
我认为可以开发一个极轻量级的模型来对具有 "constant human subject" 和 "fixed background" 的视频执行实时人体姿势估计。
模型结构
01.Data 来源:
来自一个视频的 3 张图像,具有相同的人物主题和相似的背景。 每张图像中的每个人类关键点都被很好地标记了。 其中 2 张图像将用作 "hint sources",最后一张图像将作为关键点的目标 detection/matching。
02.Hints:
23x23 像素的 ROI 将根据人体关键点的位置从 "hint source" 图像中裁剪。 这些 ROI 的中心是关键点。
03.convolutional网络"for Hints":
一个简单的三层结构。 前两层是 [2,2] 步长与 3x3 过滤器的卷积。 最后一层是在没有填充的5x5输入上进行5x5卷积(等于全连接层)
这会将一个23x23像素的提示ROI变成一个32位的哈希码。 一张hint souce图片会生成一组16个哈希码。
04.Convolutional网络"for target image": 网络与提示网络共享 smae 权重。 但在这种情况下,每个卷积层都有填充。 301x301 像素的图像将变成 76x76 "Hash map"
05.Hash匹配:
我创建了一个名为“locateMin_and_get_loss”的函数来计算"hint hash"和散列图每个点上的散列码之间的汉明距离。 此函数将创建一个 "distance map"。 距离值最小的点的位置将被视为关键点的位置。
06.Loss计算:
我做了一个函数"get_total_loss_and_result"来计算16个关键点的总损失。 损失是地面实况标签点与模型定位点之间的归一化欧式距离。
07.proposed 工作流程:
在初始化此模型之前,用户将从不同角度拍摄目标人物主体的两张照片。 这些图片将被最先进的模型(如 OpenPose 或 DeepPose)标记,并使用 03 中提到的卷积网络从中生成提示哈希。
最终视频流将由模型启动和处理。
08.Why "Two" 组提示?
一个人joint/keypoint从不同的角度观察会有非常不同的外观。 我不想增加神经网络的维数,而是想 "cheat the game" 通过收集两个提示而不是一个提示。 想知道能不能提高模型的精度和泛化能力
我遇到的问题:
01.The"No gradients for any variable "错误 (我的主要问题post):
如上所述,我在训练模型时遇到了这个错误。
我试着向 post 学习
02.The"Batch"问题:
由于其独特的结构,很难使用常规的占位符来包含多个批次的输入数据。 我通过将批号设置为 3 并手动组合损失函数的值来修复它。
2018.10.28 Edit:
只有一组提示的简化版:
import tensorflow as tf
import numpy as np
import time
from imageLoader import getPaddedROI,training_data_feeder
import math
'''
created by Cid Zhang
a sub-model for human pose estimation
'''
def truncated_normal_var(name,shape,dtype):
return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.truncated_normal_initializer(stddev=0.01)))
def zero_var(name,shape,dtype):
return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.constant_initializer(0.0)))
roi_size = 23
image_input_size = 301
#input placeholders
#batch1 hints
inputs_b1h1 = tf.placeholder(tf.float32, ( 16, roi_size, roi_size, 3), name='inputs_b1h1')
inputs_s = tf.placeholder(tf.float32, (None, image_input_size, image_input_size, 3), name='inputs_s')
labels = tf.placeholder(tf.float32,(16,76,76), name='labels')
#define the model
def paraNet(input):
out_l1 = tf.layers.conv2d(input, 8, [3, 3],strides=(2, 2), padding ='valid' ,name='para_conv_1')
out_l1 = tf.nn.relu6(out_l1)
out_l2 = tf.layers.conv2d(out_l1, 16, [3, 3],strides=(2, 2), padding ='valid' ,name='para_conv_2')
out_l2 = tf.nn.relu6(out_l2)
out_l3 = tf.layers.conv2d(out_l2, 32, [5, 5],strides=(1, 1), padding ='valid' ,name='para_conv_3')
return out_l3
#network pipeline to create the first Hint Hash Sets (Three batches)
with tf.variable_scope('conv'):
out_b1h1_l3 = paraNet(inputs_b1h1)
#flatten and binerize the hashs
out_b1h1_l3 =tf.squeeze( tf.round(tf.nn.sigmoid(out_b1h1_l3)) )
with tf.variable_scope('conv', reuse=True):
out_2_l1 = tf.layers.conv2d(inputs_s, 8, [3, 3],strides=(2, 2), padding ='same' ,name='para_conv_1')
out_2_l1 = tf.nn.relu6(out_2_l1)
out_2_l2 = tf.layers.conv2d(out_2_l1, 16, [3, 3],strides=(2, 2), padding ='same' ,name='para_conv_2')
out_2_l2 = tf.nn.relu6(out_2_l2)
out_2_l3 = tf.layers.conv2d(out_2_l2, 32, [5, 5],strides=(1, 1), padding ='same' ,name='para_conv_3')
#binerize the value into Hash code
out_2_l3 = tf.round(tf.nn.sigmoid(out_2_l3))
orig_feature_map_size = tf.shape(out_2_l3)[1]
#calculate Hamming distance maps
map0 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[0] , out_2_l3 ) ) , axis=3 )
map1 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[1] , out_2_l3 ) ) , axis=3 )
map2 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[2] , out_2_l3 ) ) , axis=3 )
map3 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[3] , out_2_l3 ) ) , axis=3 )
map4 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[4] , out_2_l3 ) ) , axis=3 )
map5 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[5] , out_2_l3 ) ) , axis=3 )
map6 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[6] , out_2_l3 ) ) , axis=3 )
map7 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[7] , out_2_l3 ) ) , axis=3 )
map8 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[8] , out_2_l3 ) ) , axis=3 )
map9 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[9] , out_2_l3 ) ) , axis=3 )
map10 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[10] , out_2_l3 ) ) , axis=3 )
map11 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[11] , out_2_l3 ) ) , axis=3 )
map12 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[12] , out_2_l3 ) ) , axis=3 )
map13 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[13] , out_2_l3 ) ) , axis=3 )
map14 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[14] , out_2_l3 ) ) , axis=3 )
map15 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[15] , out_2_l3 ) ) , axis=3 )
totoal_map =tf.div( tf.concat([map0, map1, map2, map3, map4, map5, map6, map7,
map8, map9, map10,map11,map12, map13, map14, map15], 0) , 32)
loss = tf.nn.l2_loss(totoal_map - labels , name = 'loss' )
#ValueError: No gradients provided for any variable, check your graph for ops that do not support gradients, between variables
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss )
init = tf.global_variables_initializer()
batchsize = 3
with tf.Session() as sess:
#writer = tf.summary.FileWriter("./variable_graph",graph = sess.graph)
sess.run(init)
#load image from dataset(train set)
joint_data_path = "./custom_data.json"
train_val_path = "./train_val_indices.json"
imgpath = "./000/"
input_size = 301
hint_roi_size = 23
hintSet01_norm_batch = []
hintSet02_norm_batch = []
t_img_batch = []
t_label_norm_batch = []
#load data
hintSet01,hintSet02,t_img,t_label_norm = training_data_feeder(joint_data_path, train_val_path, imgpath, input_size, hint_roi_size )
#Normalize the image pixel values to 0~1
hintSet01_norm = []
hintSet02_norm = []
t_img = np.float32(t_img /255.0)
for rois in hintSet01:
tmp = np.float32(rois / 255.0)
hintSet01_norm.append(tmp.tolist())
for rois in hintSet02:
tmp = np.float32(rois / 255.0)
hintSet02_norm.append(tmp.tolist())
print(tf.trainable_variables())
temp = sess.run(totoal_map , feed_dict={inputs_s: [t_img] ,
inputs_b1h1: hintSet01_norm,
labels: t_label_norm
})
print(temp)
print(np.shape(temp))
代码:https://github.com/gitpharm01/Parapose/blob/master/paraposeNetworkV3.py
数据集:
它是从 mpii 数据集生成的自定义数据集。 它有 223 个图像簇。 每个集群都有一个固定的人物,姿势各异,背景保持不变。 一个集群至少有 3 张图片。大约627MB,我会尽量打包稍后上传。
2018.10.26 Edit:
你可以在GoogleDrive上下载,整个数据集分为9个部分。(我不能post这篇文章中的链接超过8个。链接在这个文件中: https://github.com/gitpharm01/Parapose/blob/master/000/readme.md
我使用 https://www.tensorflow.org/guide/eager 中描述的 "eager execution" 来检查梯度。
最后我发现 "tf.round" 和 "tf.nn.relu6" 会擦除或将渐变设置为零。
我对代码做了一些修改,现在可以进入训练阶段了:
import tensorflow as tf
import numpy as np
import time
from imageLoader import getPaddedROI,training_data_feeder
import math
import cv2
'''
created by Cid Zhang
a sub-model for human pose estimation
'''
tf.reset_default_graph()
def truncated_normal_var(name,shape,dtype):
return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.truncated_normal_initializer(stddev=0.01)))
def zero_var(name,shape,dtype):
return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.constant_initializer(0.0)))
roi_size = 23
image_input_size = 301
#input placeholders
#batch1 hints
inputs_b1h1 = tf.placeholder(tf.float32, ( 16, roi_size, roi_size, 3), name='inputs_b1h1')
#inputs_b1h2 = tf.placeholder(tf.float32, ( 16, roi_size, roi_size, 3), name='inputs_b1h2')
inputs_s = tf.placeholder(tf.float32, (None, image_input_size, image_input_size, 3), name='inputs_s')
labels = tf.placeholder(tf.float32,(16,76,76), name='labels')
#define the model
def paraNet(inputs, inputs_s):
with tf.variable_scope('conv'):
out_l1 = tf.layers.conv2d(inputs, 16, [3, 3],strides=(2, 2), padding ='valid' ,name='para_conv_1')
out_l1r = tf.nn.relu(out_l1)
out_l2 = tf.layers.conv2d(out_l1r, 48, [3, 3],strides=(2, 2), padding ='valid' ,name='para_conv_2')
out_l2r = tf.nn.relu(out_l2)
out_l3 = tf.layers.conv2d(out_l2r, 96, [5, 5],strides=(1, 1), padding ='valid' ,name='para_conv_3')
out_l3r = tf.nn.relu(out_l3)
out_l4 = tf.layers.conv2d(out_l3r, 32, [1, 1],strides=(1, 1), padding ='valid' ,name='para_conv_4')
out_l4r = tf.squeeze( tf.sign( tf.sigmoid(out_l4) ) )
with tf.variable_scope('conv', reuse=True):
out_2_l1 = tf.layers.conv2d(inputs_s, 16, [3, 3],strides=(2, 2), padding ='same' ,name='para_conv_1')
out_2_l1r = tf.nn.relu(out_2_l1)
out_2_l2 = tf.layers.conv2d(out_2_l1r, 48, [3, 3],strides=(2, 2), padding ='same' ,name='para_conv_2')
out_2_l2r = tf.nn.relu(out_2_l2)
out_2_l3 = tf.layers.conv2d(out_2_l2r, 96, [5, 5],strides=(1, 1), padding ='same' ,name='para_conv_3')
out_2_l3r = tf.nn.relu(out_2_l3)
out_2_l4 = tf.layers.conv2d(out_2_l3r, 32, [1, 1],strides=(1, 1), padding ='same' ,name='para_conv_4')
out_2_l4r =tf.sign( tf.sigmoid(out_2_l4))
return out_l4r , out_2_l4r
def lossFunc(inputs_hint, inputs_sample, labels):
hint, sample = paraNet(inputs_hint, inputs_sample)
map0 = tf.reduce_sum ( tf.abs (tf.subtract( hint[0] , sample ) ) , axis=3 )
map1 = tf.reduce_sum ( tf.abs (tf.subtract( hint[1] , sample ) ) , axis=3 )
map2 = tf.reduce_sum ( tf.abs (tf.subtract( hint[2] , sample ) ) , axis=3 )
map3 = tf.reduce_sum ( tf.abs (tf.subtract( hint[3] , sample ) ) , axis=3 )
map4 = tf.reduce_sum ( tf.abs (tf.subtract( hint[4] , sample ) ) , axis=3 )
map5 = tf.reduce_sum ( tf.abs (tf.subtract( hint[5] , sample ) ) , axis=3 )
map6 = tf.reduce_sum ( tf.abs (tf.subtract( hint[6] , sample ) ) , axis=3 )
map7 = tf.reduce_sum ( tf.abs (tf.subtract( hint[7] , sample ) ) , axis=3 )
map8 = tf.reduce_sum ( tf.abs (tf.subtract( hint[8] , sample ) ) , axis=3 )
map9 = tf.reduce_sum ( tf.abs (tf.subtract( hint[9] , sample ) ) , axis=3 )
map10 = tf.reduce_sum ( tf.abs (tf.subtract( hint[10] , sample ) ) , axis=3 )
map11 = tf.reduce_sum ( tf.abs (tf.subtract( hint[11] , sample ) ) , axis=3 )
map12 = tf.reduce_sum ( tf.abs (tf.subtract( hint[12] , sample ) ) , axis=3 )
map13 = tf.reduce_sum ( tf.abs (tf.subtract( hint[13] , sample ) ) , axis=3 )
map14 = tf.reduce_sum ( tf.abs (tf.subtract( hint[14] , sample ) ) , axis=3 )
map15 = tf.reduce_sum ( tf.abs (tf.subtract( hint[15] , sample ) ) , axis=3 )
totoal_map =tf.div( tf.concat([map0, map1, map2, map3, map4, map5, map6, map7,
map8, map9, map10,map11,map12, map13, map14, map15], 0) , 64)
loss = tf.nn.l2_loss( totoal_map - labels , name = 'loss' )
return loss, totoal_map
loss, totoal_map = lossFunc(inputs_b1h1, inputs_s, labels)
train_step = tf.train.GradientDescentOptimizer(2.0).minimize(loss)
#init = tf.global_variables_initializer()
saver = tf.train.Saver()
with tf.Session() as sess:
#writer = tf.summary.FileWriter("./variable_graph",graph = sess.graph)
#sess.run(init)
#load image from dataset(train set)
joint_data_path = "./custom_data.json"
train_val_path = "./train_val_indices.json"
imgpath = "./000/"
input_size = 301
hint_roi_size = 23
'''
#load data
hintSet01,hintSet02,t_img,t_label_norm = training_data_feeder(joint_data_path, train_val_path, imgpath, input_size, hint_roi_size )
#Normalize the image pixel values to 0~1
hintSet01_norm = []
hintSet02_norm = []
t_img =[ np.float32(t_img /255.0) ]
#print(type(t_img))
#print(np.shape(t_img))
#print(type(t_label_norm))
for rois in hintSet01:
tmp = np.float32(rois / 255.0)
hintSet01_norm.append(tmp.tolist())
for rois in hintSet02:
tmp = np.float32(rois / 255.0)
hintSet02_norm.append(tmp.tolist())
loss_value , total_map_value = sess.run ([loss, totoal_map], feed_dict = {inputs_s: t_img,
inputs_b1h1: hintSet01_norm,
labels: t_label_norm
})
print("-----loss value:",loss_value)
print("-----total_map_value:", total_map_value[0,0] )
print("-----label_value", t_label_norm[0,0] )
#cv2.imshow("t_img",t_img[0])
#for img in t_label_norm:
# print(img)
# cv2.imshow("hint", img)
# cv2.waitKey(0)
#print(tf.trainable_variables())
#print(hash_set01)
#print(out_2_l3)
'''
saver.restore(sess, "./temp_model/model4.ckpt")
for i in range(1000):
#load data
hintSet01,hintSet02,t_img,t_label_norm = training_data_feeder(joint_data_path, train_val_path, imgpath, input_size, hint_roi_size )
#Normalize the image pixel values to 0~1
hintSet01_norm = []
hintSet02_norm = []
t_img =[ np.float32(t_img /255.0) ]
#print(type(t_img))
#print(np.shape(t_img))
#print(type(t_label_norm))
for rois in hintSet01:
tmp = np.float32(rois / 255.0)
hintSet01_norm.append(tmp.tolist())
for rois in hintSet02:
tmp = np.float32(rois / 255.0)
hintSet02_norm.append(tmp.tolist())
loss_val, _ = sess.run([loss, train_step] ,
feed_dict = {inputs_s: t_img,
inputs_b1h1: hintSet01_norm,
labels: t_label_norm })
if i % 50 == 0:
print(loss_val)
save_path = saver.save(sess, "./temp_model/model" + '5' + ".ckpt")
#print(temp)
#print(np.shape(temp))
但不幸的是,损失值在训练过程中并没有减少。
我认为代码中仍然存在一些错误。 无论我设置多长时间的迭代,保存的检查点文件总是命名为"XXXX.ckpt.data-00000-of-00001"。
我再写一篇post,因为这个post的主要问题已经解决了。