逻辑回归的成本函数:weird/oscillating 成本历史

Cost function for logistic regression: weird/oscillating cost history

背景和我的思考过程:

我想看看我是否可以利用逻辑回归来创建一个假设函数,该函数可以通过查看日期及其对应 leading economic indicators 来预测美国经济的衰退。众所周知,领先经济指标是经济的良好预测指标。

为此,我从经合组织获得了 1970 年 1 月至 2021 年 7 月综合领​​先(经济)指标的数据,此外还查找了 1970 年至 2021 年经济衰退何时发生。我用于训练的格式化数据可以在下面进一步找到。

知道经济衰退和 Date/LEI 之间的关系不会是简单的线性关系,我决定为每个数据点设置更多参数,以便我可以将多项式方程拟合到数据中。因此,每个数据点都有以下参数:日期、LEI、LEI^2、LEI^3、LEI^4 和 LEI^5。

问题:

当我尝试训练我的假设函数时,我得到了一个非常奇怪的成本历史记录,它似乎表明我没有正确实现我的成本函数或者我的梯度下降实现不正确。以下是我的成本历史的想象:

我已经尝试实施 的建议来修复我的成本历史记录,因为最初我遇到了 post 中描述的相同 NaN 和 Inf 问题。虽然这些建议帮助我解决了 NaN 和 Inf 问题,但一旦它开始振荡,我找不到任何东西来帮助我修复我的成本函数。我尝试过的其他一些修复是调整学习率,仔细检查我的成本和梯度下降,并为数据点引入更多参数(看看更高阶的多项式方程是否有帮助)。

我的代码 主文件是 predictor.m.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Program: Predictor.m
% Author: Hasec Rainn
% Desc: Predictor.m uses logistic regression
% to predict when economic recessions will occur
% in the United States. The data it uses is from the past 50 years.
%
% In particular, it uses dates and their corresponding economic leading
% indicators to learn a non-linear hypothesis function to fit to the data.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


LI_Data = dlmread("leading_indicators_formatted.csv"); %Get LI data
RD_Data = dlmread("recession_dates_formatted.csv"); %Get RD data

%our datapoints of interest: Dates and their corresponding
%leading Indicator values.
%We are going to increase the number of parameters per datapoint to allow
%for a non-linear hypothesis function. Specifically, let the 3rd, 4th
%5th, and 6th columns represent LI^2, LI^3, LI^4, and LI^5 respectively

X = LI_Data;        %datapoints of interest (row = 1 datapoint)
X = [X, X(:,2).^2];  %Adding LI^2
X = [X, X(:,2).^3];  %Adding LI^3
X = [X, X(:,2).^4];  %Adding LI^4
X = [X, X(:,2).^5];  %Adding LI^5

%normalize data
X(:,1) = normalize( X(:,1) );
X(:,2) = normalize( X(:,2) );
X(:,3) = normalize( X(:,3) );
X(:,4) = normalize( X(:,4) );
X(:,5) = normalize( X(:,5) );
X(:,6) = normalize( X(:,6) );


%What we want to predict: if a recession happens or doesn't happen
%for a corresponding year
Y = RD_Data(:,2); %row = 1 datapoint

%defining a few useful variables:
nIter = 4000;       %how many iterations we want to run gradient descent for
ndp = size(X, 1);   %number of data points we have to work with
nPara = size(X,2);  %number of parameters per data point
alpha = 1;           %set the learning rate to 1

%Defining Theta
Theta = ones(1, nPara); %initialize the weights of Theta to 1

%Make a cost history so we can see if gradient descent is implemented
%correctly
costHist = zeros(nIter, 1);


for i = 1:nIter
  costHist(i, 1) = cost(Theta, Y, X);
  Theta = Theta - (sum((sigmoid(X * Theta') - Y) .* X));
end

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Function: Cost
% Author: Hasec Rainn
% Parameters: Theta (vector), Y (vector), X (matrix)
% Desc: Uses Theta, Y, and X to determine the cost of our current
%       hypothesis function H_theta(X). Uses manual loop approach to
%       avoid errors that arrise from log(0).
%       Additionally, limits the range of H_Theta to prevent Inf
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


function expense = cost(Theta, Y, X)
  m = size(X, 1); %number of data points
  hTheta = sigmoid(X*Theta'); %hypothesis function
  
  %limit the range of hTheta to [10^-50, 0.9999999999999]
  for i=1:size(hTheta, 1)
    if (hTheta(i) <= 10^(-50))
      hTheta(i) = 10^(-50);
    endif
    
        if (hTheta(i) >= 0.9999999999999)
      hTheta(i) = 0.9999999999999;
    endif
  endfor
  
  expense = 0;
  
  for i = 1:m
    
    if Y(i) == 1
      expense = expense + -log(hTheta(i));
    endif
    
    if Y(i) == 0
      expense = expense + -log(1-hTheta(i));
    endif
    
  endfor
  
endfunction


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Function: normalization
% Author: Hasec Rainn
% Parameters: vector
% Desc: Takes in an input and normalizes its value(s)
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function n = normalize(data)
   dMean = mean(data);
   dStd = std(data);
   n = (data - dMean) ./ dStd;
endfunction 


%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Function: Sigmoid
% Author: Hasec Rainn
% Parameters: scalar, vector, or matrix
% Desc: Takes an input and forces its value(s) to be between
%       0 and 1. If a matrix or vector, sigmoid is applied to
%       each element.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

function result = sigmoid(z)
  result = 1 ./ ( 1 + e .^(-z) );
endfunction

我在学习过程中使用的数据可以在这里找到:formatted LI data and recession dates data

您 运行 遇到的问题是您的梯度下降函数。

特别是,虽然您正确计算了成本部分(又名 (hTheta - Y)(sigmoid(X * Theta') - Y) ),但您没有正确计算成本的导数;在 Theta = Theta - (sum((sigmoid(X * Theta') - Y) .* X)) 中,.*X 不正确。

对于每个参数,导数等于每个数据点的成本(在向量 hTheta - Y 中找到)乘以它们对应的参数 j。有关详细信息,请查看此 article.