需要帮助生成和梯度下降函数的成本

Need help in generating and the cost for a gradient descent function

我正在学习 Andre Ng 的第 1 周自然语言处理课程,并试图找到计算梯度下降的函数的组成部分。

GradientDescent 函数是这样给出的:

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
    '''
    Input:
        x: matrix of features which is (m,n+1)
        y: corresponding labels of the input matrix x, dimensions (m,1)
        theta: weight vector of dimension (n+1,1)
        alpha: learning rate
        num_iters: number of iterations you want to train your model for
    Output:
        J: the final cost
        theta: your final weight vector
    Hint: you might want to print the cost to make sure that it is going down.
    '''
    ### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
    # get 'm', the number of rows in matrix x
    m = None
    
    for i in range(0, num_iters):
        
        # get z, the dot product of x and theta
        z = None
        
        # get the sigmoid of z
        h = None
        
        # calculate the cost function
        J = None

        # update the weights theta
        theta = None
        
    ### END CODE HERE ###
    J = float(J)
    return J, theta

为了测试Functions的准确性,提供以下测试数据:

# Check the function
# Construct a synthetic test case using numpy PRNG functions
np.random.seed(1)
# X input is 10 x 3 with ones for the bias terms
tmp_X = np.append(np.ones((10, 1)), np.random.rand(10, 2) * 2000, axis=1)
# Y Labels are 10 x 1
tmp_Y = (np.random.rand(10, 1) > 0.35).astype(float)

# Apply gradient descent
tmp_J, tmp_theta = gradientDescent(tmp_X, tmp_Y, np.zeros((3, 1)), 1e-8, 700)
print(f"The cost after training is {tmp_J:.8f}.")
print(f"The resulting vector of weights is {[round(t, 8) for t in np.squeeze(tmp_theta)]}")

正确输入基础方程式的所有分量后,预期输出如下:

The cost after training is 0.67094970.
The resulting vector of weights is [4.1e-07, 0.00035658, 7.309e-05]

Technically, the cost function    is calculated by taking the dot product of the vectors 'y' and 'log(h)'. Since both 'y' and 'h' are column vectors (m,1), transpose the vector to the left, so that matrix multiplication of a row vector with column vector performs the dot product.
                   =−1×(⋅()+(1−)⋅(1−))

在我的努力下,我能够推导出 99% 的方程,但成本函数除外,它生成的值高于预期,因此,我当前的成本函数生成的值为 0.81721852,而测试变量的预期值生成 0.67094970 的值。

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
    '''
    Input:
        x: matrix of features which is (m,n+1)
        y: corresponding labels of the input matrix x, dimensions (m,1)
        theta: weight vector of dimension (n+1,1)
        alpha: learning rate
        num_iters: number of iterations you want to train your model for
    Output:
        J: the final cost
        theta: your final weight vector
    Hint: you might want to print the cost to make sure that it is going down.
    '''
    ### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
    # get 'm', the number of rows in matrix x
    m = len(x)
    xT = x.transpose()
    for i in range(0, num_iters):
        
        # get z, the dot product of x and theta
        z = np.dot(x, theta)
        
        # get the sigmoid of z
        h = sigmoid(z)
        
        # calculate the loss
        # loss = (h - y)
        
        # calculate the gradient
        # gradient = np.dot(xT, loss)
        
        # calculate the cost function
        J = np.sum((np.log(h) - y) ** 2) / (2 * m)     
        #print("Iters %d | J: %f" % (i, J))
    
        # update the weights theta
        theta = theta - ( (alpha/m) *  np.dot(xT, (h - y)) )
        
    ### END CODE HERE ###
    J = float(J)
    return J, theta

如何修改我的方程变量以获得正确的预期值 0.67094970,而不是我现在得到的值,即 0.81721852?

# UNQ_C2 (UNIQUE CELL IDENTIFIER, DO NOT EDIT)
def gradientDescent(x, y, theta, alpha, num_iters):
    '''
    Input:
        x: matrix of features which is (m,n+1)
        y: corresponding labels of the input matrix x, dimensions (m,1)
        theta: weight vector of dimension (n+1,1)
        alpha: learning rate
        num_iters: number of iterations you want to train your model for
    Output:
        J: the final cost
        theta: your final weight vector
    Hint: you might want to print the cost to make sure that it is going down.
    '''
    ### START CODE HERE (REPLACE INSTANCES OF 'None' with your code) ###
    # get 'm', the number of rows in matrix x
    m = len(x)
    xT = x.transpose()
    yT = y.transpose()
    for i in range(0, num_iters):
        
        # get z, the dot product of x and theta
        z = np.dot(x, theta)
        
        # get the sigmoid of z
        h = sigmoid(z)        
        
        # calculate the cost function
        J = - (1/m) * (yT.dot(np.log(h)) + (1-yT).dot(np.log(1-h)))
    
        # update the weights theta
        theta = theta - ((alpha/m) *  xT.dot(h - y))
        
    ### END CODE HERE ###
    J = float(J)
    return J, theta
# UNQ_C2 GRADED FUNCTION: gradientDescent
def gradientDescent(x, y, theta, alpha, num_iters):
    '''
    Input:
        x: matrix of features which is (m,n+1)
        y: corresponding labels of the input matrix x, dimensions (m,1)
        theta: weight vector of dimension (n+1,1)
        alpha: learning rate
        num_iters: number of iterations you want to train your model for
    Output:
        J: the final cost
        theta: your final weight vector
    Hint: you might want to print the cost to make sure that it is going down.
    '''
    ### START CODE HERE ###
    # get 'm', the number of rows in matrix x
    m = x.shape[0]

    
    for i in range(0, num_iters):
        
        # get z, the dot product of x and theta
        z = x.dot(theta)
        
        # get the sigmoid of z
        sigmoid_v = np.vectorize(sigmoid)
        h = sigmoid_v(z)
        
        # calculate the cost function
        J = (-1/m)*( np.dot(y.T,np.log(h)) +  np.dot((1-y).T , np.log(1-h)))

        # update the weights theta
        theta = theta - ((alpha/m)*(np.dot(x.T, h-y)))
        
    ### END CODE HERE ###
    J = float(J)
    return J, theta