训练 SIAMESE 网络时遇到 "No gradients for any variable" 错误

Question

我目前正在 Tensorflow( ver:1.8 os:Ubuntu MATE16.04) 平台上构建模型。该模型的目的是 detect/match 人体关键点。训练时，出现错误"No gradients for any variable"，我很难修复它。

模特背景: 它的基本思想来自这两篇论文：

他们表明可以根据卷积网络生成的哈希码来匹配图像。两张图片的相似度由它们对应的哈希码之间的汉明距离决定。

我认为可以开发一个极轻量级的模型来对具有 "constant human subject" 和 "fixed background" 的视频执行实时人体姿势估计。

模型结构

01.Data 来源：

来自一个视频的 3 张图像，具有相同的人物主题和相似的背景。每张图像中的每个人类关键点都被很好地标记了。其中 2 张图像将用作 "hint sources"，最后一张图像将作为关键点的目标 detection/matching。

02.Hints:

23x23 像素的 ROI 将根据人体关键点的位置从 "hint source" 图像中裁剪。这些 ROI 的中心是关键点。

03.convolutional网络"for Hints":

一个简单的三层结构。前两层是 [2,2] 步长与 3x3 过滤器的卷积。最后一层是在没有填充的5x5输入上进行5x5卷积（等于全连接层）

这会将一个23x23像素的提示ROI变成一个32位的哈希码。一张hint souce图片会生成一组16个哈希码。

04.Convolutional网络"for target image": 网络与提示网络共享 smae 权重。但在这种情况下，每个卷积层都有填充。 301x301 像素的图像将变成 76x76 "Hash map"

05.Hash匹配：

我创建了一个名为“locateMin_and_get_loss”的函数来计算"hint hash"和散列图每个点上的散列码之间的汉明距离。此函数将创建一个 "distance map"。距离值最小的点的位置将被视为关键点的位置。

06.Loss计算：

我做了一个函数"get_total_loss_and_result"来计算16个关键点的总损失。损失是地面实况标签点与模型定位点之间的归一化欧式距离。

07.proposed 工作流程：

在初始化此模型之前，用户将从不同角度拍摄目标人物主体的两张照片。这些图片将被最先进的模型（如 OpenPose 或 DeepPose）标记，并使用 03 中提到的卷积网络从中生成提示哈希。

最终视频流将由模型启动和处理。

08.Why "Two" 组提示?

一个人joint/keypoint从不同的角度观察会有非常不同的外观。我不想增加神经网络的维数，而是想 "cheat the game" 通过收集两个提示而不是一个提示。想知道能不能提高模型的精度和泛化能力

我遇到的问题：

01.The"No gradients for any variable "错误 (我的主要问题post):

如上所述，我在训练模型时遇到了这个错误。我试着向 post 学习 and this and this。但是目前我查了计算图也没有头绪

02.The"Batch"问题：

由于其独特的结构，很难使用常规的占位符来包含多个批次的输入数据。我通过将批号设置为 3 并手动组合损失函数的值来修复它。

2018.10.28 Edit:

只有一组提示的简化版：

import tensorflow as tf
import numpy as np
import time
from imageLoader import getPaddedROI,training_data_feeder
import math
'''
created by Cid Zhang 
a sub-model for human pose estimation
'''
def truncated_normal_var(name,shape,dtype):
    return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.truncated_normal_initializer(stddev=0.01)))
def zero_var(name,shape,dtype):
    return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.constant_initializer(0.0)))

roi_size = 23
image_input_size = 301

#input placeholders
#batch1 hints
inputs_b1h1 = tf.placeholder(tf.float32, ( 16, roi_size, roi_size, 3), name='inputs_b1h1')

inputs_s = tf.placeholder(tf.float32, (None, image_input_size, image_input_size, 3), name='inputs_s')
labels = tf.placeholder(tf.float32,(16,76,76), name='labels')

#define the model
def paraNet(input):
    out_l1 = tf.layers.conv2d(input, 8, [3, 3],strides=(2, 2), padding ='valid' ,name='para_conv_1')
    out_l1 = tf.nn.relu6(out_l1)
    out_l2 = tf.layers.conv2d(out_l1, 16, [3, 3],strides=(2, 2), padding ='valid' ,name='para_conv_2')
    out_l2 = tf.nn.relu6(out_l2)
    out_l3 = tf.layers.conv2d(out_l2, 32, [5, 5],strides=(1, 1), padding ='valid' ,name='para_conv_3')
    return out_l3

#network pipeline to create the first Hint Hash Sets (Three batches)
with tf.variable_scope('conv'):
    out_b1h1_l3 = paraNet(inputs_b1h1)
    #flatten and binerize the hashs
    out_b1h1_l3 =tf.squeeze(  tf.round(tf.nn.sigmoid(out_b1h1_l3)) )


with tf.variable_scope('conv', reuse=True):
    out_2_l1 = tf.layers.conv2d(inputs_s,  8, [3, 3],strides=(2, 2),     padding ='same' ,name='para_conv_1')
    out_2_l1 = tf.nn.relu6(out_2_l1)
    out_2_l2 = tf.layers.conv2d(out_2_l1, 16, [3, 3],strides=(2, 2), padding ='same' ,name='para_conv_2')
    out_2_l2 = tf.nn.relu6(out_2_l2)
    out_2_l3 = tf.layers.conv2d(out_2_l2, 32, [5, 5],strides=(1, 1), padding ='same' ,name='para_conv_3')
    #binerize the value into Hash code

    out_2_l3 = tf.round(tf.nn.sigmoid(out_2_l3))

    orig_feature_map_size = tf.shape(out_2_l3)[1]

    #calculate Hamming distance maps
    map0 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[0] , out_2_l3 ) ) , axis=3 )  
    map1 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[1] , out_2_l3 ) ) , axis=3 )  
    map2 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[2] , out_2_l3 ) ) , axis=3 )  
    map3 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[3] , out_2_l3 ) ) , axis=3 )  
    map4 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[4] , out_2_l3 ) ) , axis=3 )  
    map5 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[5] , out_2_l3 ) ) , axis=3 )  
    map6 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[6] , out_2_l3 ) ) , axis=3 )  
    map7 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[7] , out_2_l3 ) ) , axis=3 )  
    map8 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[8] , out_2_l3 ) ) , axis=3 )  
    map9 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[9] , out_2_l3 ) ) , axis=3 )  
    map10 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[10] , out_2_l3 ) ) , axis=3 )  
    map11 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[11] , out_2_l3 ) ) , axis=3 )  
    map12 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[12] , out_2_l3 ) ) , axis=3 )  
    map13 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[13] , out_2_l3 ) ) , axis=3 )  
    map14 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[14] , out_2_l3 ) ) , axis=3 )  
    map15 = tf.reduce_sum ( tf.abs (tf.subtract( out_b1h1_l3[15] , out_2_l3 ) ) , axis=3 )  

    totoal_map =tf.div( tf.concat([map0, map1, map2, map3, map4, map5, map6, map7,
                               map8, map9, map10,map11,map12, map13, map14, map15], 0) , 32)
    loss = tf.nn.l2_loss(totoal_map - labels  , name = 'loss'  )

#ValueError: No gradients provided for any variable, check your graph     for ops that do not support gradients, between variables 
    train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss )


init =  tf.global_variables_initializer()
batchsize = 3

with tf.Session() as sess:
#writer = tf.summary.FileWriter("./variable_graph",graph = sess.graph)
sess.run(init)

#load image from dataset(train set)
joint_data_path = "./custom_data.json"
train_val_path = "./train_val_indices.json"
imgpath = "./000/"
input_size = 301
hint_roi_size = 23

hintSet01_norm_batch = []
hintSet02_norm_batch = []
t_img_batch = []
t_label_norm_batch = []
#load data
hintSet01,hintSet02,t_img,t_label_norm = training_data_feeder(joint_data_path, train_val_path, imgpath, input_size, hint_roi_size )
#Normalize the image pixel values to 0~1
hintSet01_norm = []
hintSet02_norm = []

t_img = np.float32(t_img /255.0)

for rois in hintSet01:
    tmp = np.float32(rois / 255.0)
    hintSet01_norm.append(tmp.tolist())
for rois in hintSet02:
    tmp = np.float32(rois / 255.0)
    hintSet02_norm.append(tmp.tolist())

print(tf.trainable_variables())

temp = sess.run(totoal_map , feed_dict={inputs_s:  [t_img]  , 
                                    inputs_b1h1: hintSet01_norm, 
                                    labels: t_label_norm 
                                                   })
print(temp)
print(np.shape(temp))

代码：https://github.com/gitpharm01/Parapose/blob/master/paraposeNetworkV3.py

Tensorflow 图：https://github.com/gitpharm01/Parapose/blob/master/variable_graph/events.out.tfevents.1540296979.pharmboy-K30AD-M31AD-M51AD

数据集：

它是从 mpii 数据集生成的自定义数据集。它有 223 个图像簇。每个集群都有一个固定的人物，姿势各异，背景保持不变。一个集群至少有 3 张图片。大约627MB，我会尽量打包稍后上传。

2018.10.26 Edit:

你可以在GoogleDrive上下载，整个数据集分为9个部分。（我不能post这篇文章中的链接超过8个。链接在这个文件中： https://github.com/gitpharm01/Parapose/blob/master/000/readme.md

Answer 1

我使用 https://www.tensorflow.org/guide/eager 中描述的 "eager execution" 来检查梯度。

最后我发现 "tf.round" 和 "tf.nn.relu6" 会擦除或将渐变设置为零。

我对代码做了一些修改，现在可以进入训练阶段了：

import tensorflow as tf
import numpy as np
import time
from imageLoader import getPaddedROI,training_data_feeder
import math
import cv2
'''
created by Cid Zhang 
a sub-model for human pose estimation
'''
tf.reset_default_graph()

def truncated_normal_var(name,shape,dtype):
    return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.truncated_normal_initializer(stddev=0.01)))
def zero_var(name,shape,dtype):
    return(tf.get_variable(name=name, shape=shape, dtype=dtype, initializer=tf.constant_initializer(0.0)))

roi_size = 23
image_input_size = 301

#input placeholders
#batch1 hints
inputs_b1h1 = tf.placeholder(tf.float32, ( 16, roi_size, roi_size, 3), name='inputs_b1h1')
#inputs_b1h2 = tf.placeholder(tf.float32, ( 16, roi_size, roi_size, 3), name='inputs_b1h2')


inputs_s = tf.placeholder(tf.float32, (None, image_input_size, image_input_size, 3), name='inputs_s')
labels = tf.placeholder(tf.float32,(16,76,76), name='labels')

#define the model

def paraNet(inputs, inputs_s):
    with tf.variable_scope('conv'):
        out_l1 = tf.layers.conv2d(inputs, 16, [3, 3],strides=(2, 2), padding ='valid' ,name='para_conv_1')
        out_l1r = tf.nn.relu(out_l1)
        out_l2 = tf.layers.conv2d(out_l1r, 48, [3, 3],strides=(2, 2), padding ='valid' ,name='para_conv_2')
        out_l2r = tf.nn.relu(out_l2)
        out_l3 = tf.layers.conv2d(out_l2r, 96, [5, 5],strides=(1, 1), padding ='valid' ,name='para_conv_3')
        out_l3r = tf.nn.relu(out_l3)
        out_l4 = tf.layers.conv2d(out_l3r, 32, [1, 1],strides=(1, 1), padding ='valid' ,name='para_conv_4')
        out_l4r = tf.squeeze(  tf.sign( tf.sigmoid(out_l4) ) )

    with tf.variable_scope('conv', reuse=True):
        out_2_l1 = tf.layers.conv2d(inputs_s,  16, [3, 3],strides=(2, 2), padding ='same' ,name='para_conv_1')
        out_2_l1r = tf.nn.relu(out_2_l1)
        out_2_l2 = tf.layers.conv2d(out_2_l1r, 48, [3, 3],strides=(2, 2), padding ='same' ,name='para_conv_2')
        out_2_l2r = tf.nn.relu(out_2_l2)
        out_2_l3 = tf.layers.conv2d(out_2_l2r, 96, [5, 5],strides=(1, 1), padding ='same' ,name='para_conv_3')
        out_2_l3r = tf.nn.relu(out_2_l3)
        out_2_l4 = tf.layers.conv2d(out_2_l3r, 32, [1, 1],strides=(1, 1), padding ='same' ,name='para_conv_4')
        out_2_l4r =tf.sign( tf.sigmoid(out_2_l4))
    return out_l4r , out_2_l4r  

def lossFunc(inputs_hint, inputs_sample, labels):    
    hint, sample = paraNet(inputs_hint, inputs_sample)

    map0 = tf.reduce_sum ( tf.abs (tf.subtract( hint[0] , sample ) ) , axis=3 )  
    map1 = tf.reduce_sum ( tf.abs (tf.subtract( hint[1] , sample ) ) , axis=3 )  
    map2 = tf.reduce_sum ( tf.abs (tf.subtract( hint[2] , sample ) ) , axis=3 )  
    map3 = tf.reduce_sum ( tf.abs (tf.subtract( hint[3] , sample ) ) , axis=3 )  
    map4 = tf.reduce_sum ( tf.abs (tf.subtract( hint[4] , sample ) ) , axis=3 )  
    map5 = tf.reduce_sum ( tf.abs (tf.subtract( hint[5] , sample ) ) , axis=3 )  
    map6 = tf.reduce_sum ( tf.abs (tf.subtract( hint[6] , sample ) ) , axis=3 )  
    map7 = tf.reduce_sum ( tf.abs (tf.subtract( hint[7] , sample ) ) , axis=3 )  
    map8 = tf.reduce_sum ( tf.abs (tf.subtract( hint[8] , sample ) ) , axis=3 )  
    map9 = tf.reduce_sum ( tf.abs (tf.subtract( hint[9] , sample ) ) , axis=3 )  
    map10 = tf.reduce_sum ( tf.abs (tf.subtract( hint[10] , sample ) ) , axis=3 )  
    map11 = tf.reduce_sum ( tf.abs (tf.subtract( hint[11] , sample ) ) , axis=3 )  
    map12 = tf.reduce_sum ( tf.abs (tf.subtract( hint[12] , sample ) ) , axis=3 )  
    map13 = tf.reduce_sum ( tf.abs (tf.subtract( hint[13] , sample ) ) , axis=3 )  
    map14 = tf.reduce_sum ( tf.abs (tf.subtract( hint[14] , sample ) ) , axis=3 )  
    map15 = tf.reduce_sum ( tf.abs (tf.subtract( hint[15] , sample ) ) , axis=3 )  

    totoal_map =tf.div( tf.concat([map0, map1, map2, map3, map4, map5, map6, map7,
                               map8, map9, map10,map11,map12, map13, map14, map15], 0) , 64)
    loss = tf.nn.l2_loss( totoal_map -  labels , name = 'loss'  )
    return loss, totoal_map

loss, totoal_map = lossFunc(inputs_b1h1, inputs_s, labels)
train_step = tf.train.GradientDescentOptimizer(2.0).minimize(loss)

#init =  tf.global_variables_initializer()

saver = tf.train.Saver()

with tf.Session() as sess:
    #writer = tf.summary.FileWriter("./variable_graph",graph = sess.graph)
    #sess.run(init)

    #load image from dataset(train set)
    joint_data_path = "./custom_data.json"
    train_val_path = "./train_val_indices.json"
    imgpath = "./000/"
    input_size = 301
    hint_roi_size = 23
    '''
    #load data
    hintSet01,hintSet02,t_img,t_label_norm = training_data_feeder(joint_data_path,     train_val_path, imgpath, input_size, hint_roi_size )
    #Normalize the image pixel values to 0~1
    hintSet01_norm = []
    hintSet02_norm = []

    t_img =[ np.float32(t_img /255.0) ]
    #print(type(t_img))
    #print(np.shape(t_img))
    #print(type(t_label_norm))
    for rois in hintSet01:
        tmp = np.float32(rois / 255.0)
        hintSet01_norm.append(tmp.tolist())
    for rois in hintSet02:
        tmp = np.float32(rois / 255.0)
        hintSet02_norm.append(tmp.tolist())

    loss_value , total_map_value = sess.run ([loss, totoal_map], feed_dict = {inputs_s:  t_img, 
                                                                                                  inputs_b1h1: hintSet01_norm, 
                                                                          labels:     t_label_norm
                                                                          })
    print("-----loss value:",loss_value)
    print("-----total_map_value:", total_map_value[0,0] )
    print("-----label_value", t_label_norm[0,0] )
    #cv2.imshow("t_img",t_img[0])
    #for img in t_label_norm:
    #    print(img)
    #    cv2.imshow("hint", img)
    #    cv2.waitKey(0)

    #print(tf.trainable_variables())
    #print(hash_set01)
    #print(out_2_l3)
    '''
    saver.restore(sess, "./temp_model/model4.ckpt")


    for i in range(1000):

        #load data
        hintSet01,hintSet02,t_img,t_label_norm = training_data_feeder(joint_data_path, train_val_path, imgpath, input_size, hint_roi_size )
        #Normalize the image pixel values to 0~1
        hintSet01_norm = []
        hintSet02_norm = []

        t_img =[ np.float32(t_img /255.0) ]
        #print(type(t_img))
        #print(np.shape(t_img))
        #print(type(t_label_norm))
        for rois in hintSet01:
            tmp = np.float32(rois / 255.0)
            hintSet01_norm.append(tmp.tolist())
        for rois in hintSet02:
            tmp = np.float32(rois / 255.0)
            hintSet02_norm.append(tmp.tolist())
        loss_val, _ = sess.run([loss, train_step] , 
                      feed_dict = {inputs_s:  t_img, 
                                   inputs_b1h1: hintSet01_norm, 
                                   labels: t_label_norm })
        if i % 50 == 0:
            print(loss_val)

    save_path = saver.save(sess, "./temp_model/model" + '5' + ".ckpt")
    #print(temp)
    #print(np.shape(temp))

但不幸的是，损失值在训练过程中并没有减少。

我认为代码中仍然存在一些错误。无论我设置多长时间的迭代，保存的检查点文件总是命名为"XXXX.ckpt.data-00000-of-00001"。

我再写一篇post，因为这个post的主要问题已经解决了。

训练 SIAMESE 网络时遇到 "No gradients for any variable" 错误

Facing "No gradients for any variable" Error while training a SIAMESE NETWORK

python

computer-vision

pose-estimation

tensorflow