Vanishing/Exploding deeplearning4j 中的梯度

Vanishing/Exploding gradients in deeplearning4j

如何检查我们在 deeplearning4j 中是否有 vanishing/exploding 梯度,更具体地说是针对递归神经网络?我的意思是,要查找哪些参数以及我们应该调用哪些方法来获取此类参数的值?

按照上面的建议,你应该看看 GUI,介绍 here

DL4J GUI:概览选项卡 -> Update:Parameter 比率

The ratio of updates to parameters is specifically the ratio of mean magnitudes of these values (i.e., log10(mean(abs(updates))/mean(abs(parameters)))

因此,显着高或低的值可能表明 exploding/vanishing 渐变。

以编程方式

在每次迭代结束时,梯度都存储在 ComputationalGraph 和 MultiLayerNetwork 的梯度场中。它可以通过 public gradient() 方法访问(此方法不改变状态,它是一个简单的 getter),因此您可以在代码中分析梯度。

这是一个小代码片段,输出梯度的最小值、平均值、每个变量的最大值,以及最小值的 log10(幅度):

    StringBuilder gradSummary = new StringBuilder("--- Gradients ---\n");
    net.gradient().gradientForVariable().forEach((var, grad) -> {
        Number min = grad.aminNumber();
        Number max = grad.amaxNumber();
        Number mean = grad.ameanNumber();
        int order = (int) Math.log10(min.doubleValue());
        gradSummary.append(var).append(": ")
            .append(min).append(",")
            .append(mean).append(",")
            .append(max).append(",")
            .append("magnitude: ").append(order).append('\n');
    });
    gradSummary.append("-----------------");
    log.info(gradSummary.toString());

它产生如下输出(注意变量是根据图层名称命名的):

2019-01-05 15:26:12 INFO  --- Gradients ---
lstm-1_W: 4.1305625586574024E-11,2.102349571941886E-5,5.235217977315187E-4, magnitude: -10
lstm-1_RW: 6.30961949354969E-11,1.7203132301801816E-5,1.335109118372202E-4, magnitude: -10
lstm-1_b: 2.9782620813989524E-10,3.226526814614772E-6,3.882131932186894E-5, magnitude: -9
lstm-2_W: 2.340811988688074E-10,2.496814886399079E-5,7.095998153090477E-4, magnitude: -9
lstm-2_RW: 8.640199666842818E-11,4.6048542571952567E-5,0.0015051497612148523, magnitude: -10
lstm-2_b: 6.85293555235944E-9,3.012867455254309E-5,4.262796137481928E-4, magnitude: -8
lstm-3_W: 1.141415850725025E-10,5.7301283959532157E-5,0.0024848710745573044, magnitude: -9
lstm-3_RW: 2.446540747769177E-10,3.4060700272675604E-5,0.002297096885740757, magnitude: -9
lstm-3_b: 1.5003001507807312E-8,2.131067230948247E-5,2.356997865717858E-4, magnitude: -7
norm-1_gamma: 4.6524661456714966E-8,2.8755117455148138E-5,1.543344114907086E-4, magnitude: -7
norm-1_beta: 5.754080234510184E-7,1.0409040987724438E-4,3.460813604760915E-4, magnitude: -6
norm-1_mean: 8.82148754044465E-7,0.0033756729681044817,0.048742543905973434, magnitude: -6
norm-1_var: 3.0532873451782905E-10,2.6078732844325714E-6,1.6723810404073447E-4, magnitude: -9
dense-1_W: 3.8744474295526743E-10,5.491946285474114E-5,6.59565266687423E-4, magnitude: -9
dense-1_b: 4.4111070565122645E-6,1.4454024494625628E-4,4.0868428186513484E-4, magnitude: -5
norm-2_gamma: 2.477656607879908E-6,9.73446512944065E-5,2.708708052523434E-4, magnitude: -5
norm-2_beta: 3.106115855189273E-6,4.934889730066061E-4,0.0012065295595675707, magnitude: -5
norm-2_mean: 2.7818930902867578E-5,0.004300051834434271,0.01411475520581007, magnitude: -4
norm-2_var: 1.806318869057577E-5,0.007471780758351088,0.020012110471725464, magnitude: -4
output_W: 7.830021786503494E-8,1.4970696065574884E-4,4.896917380392551E-4, magnitude: -7
output_b: 3.1583107193000615E-4,6.765704602003098E-4,0.0011031415779143572, magnitude: -3
-----------------

您甚至可以将此代码包装在迭代侦听器周围,并在每 N 次迭代中输出一次,以帮助照看您的训练过程。