逻辑回归的成本函数:weird/oscillating 成本历史

Cost function for logistic regression: weird/oscillating cost history


我想看看我是否可以利用逻辑回归来创建一个假设函数,该函数可以通过查看日期及其对应 leading economic indicators 来预测美国经济的衰退。众所周知,领先经济指标是经济的良好预测指标。

为此,我从经合组织获得了 1970 年 1 月至 2021 年 7 月综合领​​先(经济)指标的数据,此外还查找了 1970 年至 2021 年经济衰退何时发生。我用于训练的格式化数据可以在下面进一步找到。

知道经济衰退和 Date/LEI 之间的关系不会是简单的线性关系,我决定为每个数据点设置更多参数,以便我可以将多项式方程拟合到数据中。因此,每个数据点都有以下参数:日期、LEI、LEI^2、LEI^3、LEI^4 和 LEI^5。



我已经尝试实施 的建议来修复我的成本历史记录,因为最初我遇到了 post 中描述的相同 NaN 和 Inf 问题。虽然这些建议帮助我解决了 NaN 和 Inf 问题,但一旦它开始振荡,我找不到任何东西来帮助我修复我的成本函数。我尝试过的其他一些修复是调整学习率,仔细检查我的成本和梯度下降,并为数据点引入更多参数(看看更高阶的多项式方程是否有帮助)。

我的代码 主文件是 predictor.m.

% Program: Predictor.m
% Author: Hasec Rainn
% Desc: Predictor.m uses logistic regression
% to predict when economic recessions will occur
% in the United States. The data it uses is from the past 50 years.
% In particular, it uses dates and their corresponding economic leading
% indicators to learn a non-linear hypothesis function to fit to the data.

LI_Data = dlmread("leading_indicators_formatted.csv"); %Get LI data
RD_Data = dlmread("recession_dates_formatted.csv"); %Get RD data

%our datapoints of interest: Dates and their corresponding
%leading Indicator values.
%We are going to increase the number of parameters per datapoint to allow
%for a non-linear hypothesis function. Specifically, let the 3rd, 4th
%5th, and 6th columns represent LI^2, LI^3, LI^4, and LI^5 respectively

X = LI_Data;        %datapoints of interest (row = 1 datapoint)
X = [X, X(:,2).^2];  %Adding LI^2
X = [X, X(:,2).^3];  %Adding LI^3
X = [X, X(:,2).^4];  %Adding LI^4
X = [X, X(:,2).^5];  %Adding LI^5

%normalize data
X(:,1) = normalize( X(:,1) );
X(:,2) = normalize( X(:,2) );
X(:,3) = normalize( X(:,3) );
X(:,4) = normalize( X(:,4) );
X(:,5) = normalize( X(:,5) );
X(:,6) = normalize( X(:,6) );

%What we want to predict: if a recession happens or doesn't happen
%for a corresponding year
Y = RD_Data(:,2); %row = 1 datapoint

%defining a few useful variables:
nIter = 4000;       %how many iterations we want to run gradient descent for
ndp = size(X, 1);   %number of data points we have to work with
nPara = size(X,2);  %number of parameters per data point
alpha = 1;           %set the learning rate to 1

%Defining Theta
Theta = ones(1, nPara); %initialize the weights of Theta to 1

%Make a cost history so we can see if gradient descent is implemented
costHist = zeros(nIter, 1);

for i = 1:nIter
  costHist(i, 1) = cost(Theta, Y, X);
  Theta = Theta - (sum((sigmoid(X * Theta') - Y) .* X));

% Function: Cost
% Author: Hasec Rainn
% Parameters: Theta (vector), Y (vector), X (matrix)
% Desc: Uses Theta, Y, and X to determine the cost of our current
%       hypothesis function H_theta(X). Uses manual loop approach to
%       avoid errors that arrise from log(0).
%       Additionally, limits the range of H_Theta to prevent Inf

function expense = cost(Theta, Y, X)
  m = size(X, 1); %number of data points
  hTheta = sigmoid(X*Theta'); %hypothesis function
  %limit the range of hTheta to [10^-50, 0.9999999999999]
  for i=1:size(hTheta, 1)
    if (hTheta(i) <= 10^(-50))
      hTheta(i) = 10^(-50);
        if (hTheta(i) >= 0.9999999999999)
      hTheta(i) = 0.9999999999999;
  expense = 0;
  for i = 1:m
    if Y(i) == 1
      expense = expense + -log(hTheta(i));
    if Y(i) == 0
      expense = expense + -log(1-hTheta(i));

% Function: normalization
% Author: Hasec Rainn
% Parameters: vector
% Desc: Takes in an input and normalizes its value(s)

function n = normalize(data)
   dMean = mean(data);
   dStd = std(data);
   n = (data - dMean) ./ dStd;

% Function: Sigmoid
% Author: Hasec Rainn
% Parameters: scalar, vector, or matrix
% Desc: Takes an input and forces its value(s) to be between
%       0 and 1. If a matrix or vector, sigmoid is applied to
%       each element.

function result = sigmoid(z)
  result = 1 ./ ( 1 + e .^(-z) );

我在学习过程中使用的数据可以在这里找到:formatted LI data and recession dates data

您 运行 遇到的问题是您的梯度下降函数。

特别是,虽然您正确计算了成本部分(又名 (hTheta - Y)(sigmoid(X * Theta') - Y) ),但您没有正确计算成本的导数;在 Theta = Theta - (sum((sigmoid(X * Theta') - Y) .* X)) 中,.*X 不正确。

对于每个参数,导数等于每个数据点的成本(在向量 hTheta - Y 中找到)乘以它们对应的参数 j。有关详细信息,请查看此 article.