这对正规方程来说是一个好的结果吗？如果不是，我怎么知道它对数据集有好处？

Question

我正在尝试学习 ML 的正规方程，但我不确定这个结果是否 correct.Parameters 如此之高，我找不到这个参数的假设，这就是我的意思：

数据集：

2104,5,1,45,460
1416,3,2,40,232
1534,3,2,30,315
852,2,1,36,178

当我运行编码时，我正在获取参数 (theta)

[4.74289062e+02  1.65405273e-01  -4.68750000e+00  -1.16445312e+02
 -2.13281250e+00]

但是这些数字太高了。这是正常的吗？此外，在我的代码的最后一行，我尝试只为我的第一个数据集元素 (2104,5,1,45,460) 打印假设，但它给我这样的错误：

print (q[0]*x[0][0])+(q[1]*x[0][1])+(q[2]*x[0][2])+(q[3]*x[0][3])+(q[4]*x[0][4])
  IndexError: index out of bounds

我的代码：

data = np.loadtxt('bib', delimiter=',');
x=data[:,0:4];
y=data[:,4];
a=np.ones(shape=(y.size,x[0].size+1));
a[:,1:5]=x;
A=np.linalg.inv(a.transpose().dot(a));
B=np.dot(a.transpose(),y);
q=A.dot(a.transpose()).dot(y);
##print (q[0]*x[0][0])+(q[1]*x[0][1])+(q[2]*x[0][2])+(q[3]*x[0][3])+(q[4]*x[0][4])

我对这些结果并不满意。我怎样才能确保这些结果是真实的，我应该如何用这些参数找到我的假设？

Answer 1

but these numbers are so high. Is this normal?

您的数据不是 normalized/scaled，您的值很大 (~1e2) 因此回归系数相同。

I tried to print hypothesis just for my first data set elements (2104,5,1,45,460), but it's giving to me an error like this:

错误是合理的 - 您的数据有 4 个维度，并且您尝试索引 5 个值。当您使用 "a" 的 1 列作为偏差项时，您的假设的形式为：

h(x) = <q, [1 x]> = q0 + q1*x0 + q2*x1 + q3*x2 + q4*x3

因此在代码中（使用您的符号和约定）：

def h(x):
   a = np.ones(x.shape[0], x.shape[1] + 1)
   a[:, 1:5] = x
   return a.dot(q)

How can I make sure these results are true

您可以将它们与许多现有实现进行比较。您还可以创建一个虚拟测试集，其中 y 是实际上是 x 的线性组合，并检查是否出现 0 错误。

这对正规方程来说是一个好的结果吗？如果不是，我怎么知道它对数据集有好处？

Is this a good result for normal equations,if not how do I know it's good for data set?

python

machine-learning

normalization

linear-regression