从二进制观察中恢复概率分布 - 这种实现的缺陷的原因是什么？

Question

我正在尝试恢复概率分布（不是概率密度，范围在 [0,1] 内且 f(x) 编码 x 处观察的成功概率的任何函数）。我使用具有 10 个神经元和 softmax 的隐藏层。这是我的代码：

import tensorflow as tf
import numpy as np
import random
import math

#Make binary observations encoded as one-hot vectors.
def makeObservations(probabilities):
    observations = np.zeros((len(probabilities),2), dtype='float32')
    for i in range(0, len(probabilities)):        
        if random.random() <= probabilities[i]:
            observations[i,0] = 1
            observations[i,1] = 0
        else:
            observations[i,0] = 0
            observations[i,1] = 1
    return observations

xTrain = np.linspace(0, 4*math.pi, 2001).reshape(1,-1)
distribution = map(lambda x: math.sin(x)**2, xTrain[0])
yTrain = makeObservations(distribution)

def weight_variable(shape):
  initial = tf.truncated_normal(shape, stddev=0.1)
  return tf.Variable(initial)

def bias_variable(shape):
  initial = tf.constant(0.1, shape=shape)
  return tf.Variable(initial)

x = tf.placeholder("float", [1,None])
hiddenDim = 10

b = bias_variable([hiddenDim,1])
W = weight_variable([hiddenDim, 1])

b2 = bias_variable([2,1])
W2 = weight_variable([2, hiddenDim])
hidden = tf.nn.sigmoid(tf.matmul(W, x) + b)
y = tf.transpose(tf.matmul(W2, hidden) + b2)

loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, yTrain))
step = tf.Variable(0, trainable=False)
rate = tf.train.exponential_decay(0.2, step, 1, 0.9999)
optimizer = tf.train.AdamOptimizer(rate)
train = optimizer.minimize(loss, global_step=step)

predict_op = tf.argmax(y, 1)

sess = tf.Session()
init = tf.initialize_all_variables()
sess.run(init)

for i in range(50001):
    sess.run(train, feed_dict={x: xTrain})
    if i%200 == 0:
        #proportion of correct predictions
        print i, np.mean(np.argmax(yTrain, axis=1) ==
                     sess.run(predict_op, feed_dict={x: xTrain}))

import matplotlib.pyplot as plt
ys = tf.nn.softmax(y).eval({x:xTrain}, sess)
plt.plot(xTrain[0],ys[:,0])
plt.plot(xTrain[0],distribution)
plt.plot(xTrain[0], yTrain[:,0], 'ro')
plt.show()

这里有两个典型的结果：

问题：

执行 tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(y, yTrain)) 和手动应用 softmax 最小化交叉熵有什么区别？

模型通常不会捕捉到分布的最后一个周期。我只让它成功过一次。也许它会通过进行更多的训练运行来修复，但它看起来并不像最后 ~20k 运行的结果通常稳定。最有可能通过更好地选择优化算法、更多隐藏层或更多维度的隐藏层来改进它吗？（编辑部分回答）

x=0 附近的像差是典型的。是什么原因造成的？

编辑：通过

，贴合度提高了很多

hiddenDim = 15
(...)
optimizer = tf.train.AdagradOptimizer(0.5)

并将激活从 sigmoid 更改为 tanh。

更多问题：

隐藏维度越高越容易突破局部极小值，这是典型的吗？

隐藏层的最佳维度与输入维度之间的近似典型关系是什么dim(hidden) = f(dim(input))？线性，弱于线性还是强于线性？

Answer 1

左边过拟合，右边欠拟合

由于小的随机偏差，你的隐藏单元在 x=0 附近都接近零激活，并且由于 x 值的不对称性和大范围，大多数隐藏单元在 [= 附近饱和11=].

梯度无法流过饱和单元，因此它们都被用尽以过度拟合它们可以感觉到的值，接近于零。

我认为将数据集中在 x=0 上会有所帮助。尝试减少权重初始化方差，and/or 增加偏差初始化方差（或等效地，将数据范围缩小到更小的区域，如 [-1,1]）。

如果您使用 RBF 并将它们全部初始化为接近零，您会遇到同样的问题。使用线性 S 形单元，第二层使用成对的线性 S 形单元来制作 RBF。

从二进制观察中恢复概率分布 - 这种实现的缺陷的原因是什么？

Recovering probability distribution from binary observations - what are the reasons for the defects of this implementation?

python

neural-network

tensorflow