Python 计算 Beta 矩阵的函数
Python Function to Compute a Beta Matrix
我正在寻找一种有效的函数,在给定一个因变量和一组预测变量作为 python 中的 DataFrame 的情况下,自动为每个可能的多元回归模型生成 beta。
例如,给定这组数据:
https://i.stack.imgur.com/YuPuv.jpg
因变量是 'Cases per Capita',后面的列是预测变量。
在一个更简单的例子中:
Student Grade Hours Slept Hours Studied ...
--------- -------- ------------- --------------- -----
A 90 9 1 ...
B 85 7 2 ...
C 100 4 5 ...
... ... ... ... ...
beta 矩阵输出看起来像这样:
Regression Hours Slept Hours Studied
------------ ------------- ---------------
1 # N/A
2 N/A #
3 # #
table 大小将是 [2^n - 1]
,其中 n
是变量的数量,因此在有 5 个预测变量和 1 个预测变量的情况下依赖,将有 31 个回归,每个回归都有不同的可能组合 beta
计算。
更详细地描述了该过程here and an actual solution that is written in R is posted here。
我不知道有任何软件包已经这样做了。但是您可以创建所有这些组合 (2^n-1),其中 n 是 X 中的列数(自变量),并为每个组合拟合线性回归模型,然后为每个模型得到 coefficients/betas。
下面是我的做法,希望对您有所帮助
from sklearn import datasets, linear_model
import numpy as np
from itertools import combinations
#test dataset
X, y = datasets.load_boston(return_X_y=True)
X = X[:,:3] # Orginal X has 13 columns, only taking n=3 instead of 13 columns
#create all 2^n-1 (here 7 because n=3) combinations of columns, where n is the number of features/indepdent variables
all_combs = []
for i in range(X.shape[1]):
all_combs.extend(combinations(range(X.shape[1]),i+1))
# print 2^n-1 combinations
print('2^n-1 combinations are:')
print(all_combs)
## Create a betas/coefficients as zero matrix with rows (2^n-1) and columns equal to X
betas = np.zeros([len(all_combs), X.shape[1]])+np.NaN
## Fit a model for each combination of columns and add the coefficients into betas matrix
lr = linear_model.LinearRegression()
for regression_no, comb in enumerate(all_combs):
lr.fit(X[:,comb], y)
betas[regression_no, comb] = lr.coef_
## Print Coefficients of each model
print('Regression No'.center(15)+" ".join(['column {}'.format(i).center(10) for i in range(X.shape[1])]))
print('_'*50)
for index, beta in enumerate(betas):
print('{}'.format(index + 1).center(15), " ".join(['{:.4f}'.format(beta[i]).center(10) for i in range(X.shape[1])]))
结果
2^n-1 combinations are:
[(0,), (1,), (2,), (0, 1), (0, 2), (1, 2), (0, 1, 2)]
Regression No column 0 column 1 column 2
__________________________________________________
1 -0.4152 nan nan
2 nan 0.1421 nan
3 nan nan -0.6485
4 -0.3521 0.1161 nan
5 -0.2455 nan -0.5234
6 nan 0.0564 -0.5462
7 -0.2486 0.0585 -0.4156
我正在寻找一种有效的函数,在给定一个因变量和一组预测变量作为 python 中的 DataFrame 的情况下,自动为每个可能的多元回归模型生成 beta。
例如,给定这组数据:
https://i.stack.imgur.com/YuPuv.jpg
因变量是 'Cases per Capita',后面的列是预测变量。
在一个更简单的例子中:
Student Grade Hours Slept Hours Studied ...
--------- -------- ------------- --------------- -----
A 90 9 1 ...
B 85 7 2 ...
C 100 4 5 ...
... ... ... ... ...
beta 矩阵输出看起来像这样:
Regression Hours Slept Hours Studied
------------ ------------- ---------------
1 # N/A
2 N/A #
3 # #
table 大小将是 [2^n - 1]
,其中 n
是变量的数量,因此在有 5 个预测变量和 1 个预测变量的情况下依赖,将有 31 个回归,每个回归都有不同的可能组合 beta
计算。
更详细地描述了该过程here and an actual solution that is written in R is posted here。
我不知道有任何软件包已经这样做了。但是您可以创建所有这些组合 (2^n-1),其中 n 是 X 中的列数(自变量),并为每个组合拟合线性回归模型,然后为每个模型得到 coefficients/betas。
下面是我的做法,希望对您有所帮助
from sklearn import datasets, linear_model
import numpy as np
from itertools import combinations
#test dataset
X, y = datasets.load_boston(return_X_y=True)
X = X[:,:3] # Orginal X has 13 columns, only taking n=3 instead of 13 columns
#create all 2^n-1 (here 7 because n=3) combinations of columns, where n is the number of features/indepdent variables
all_combs = []
for i in range(X.shape[1]):
all_combs.extend(combinations(range(X.shape[1]),i+1))
# print 2^n-1 combinations
print('2^n-1 combinations are:')
print(all_combs)
## Create a betas/coefficients as zero matrix with rows (2^n-1) and columns equal to X
betas = np.zeros([len(all_combs), X.shape[1]])+np.NaN
## Fit a model for each combination of columns and add the coefficients into betas matrix
lr = linear_model.LinearRegression()
for regression_no, comb in enumerate(all_combs):
lr.fit(X[:,comb], y)
betas[regression_no, comb] = lr.coef_
## Print Coefficients of each model
print('Regression No'.center(15)+" ".join(['column {}'.format(i).center(10) for i in range(X.shape[1])]))
print('_'*50)
for index, beta in enumerate(betas):
print('{}'.format(index + 1).center(15), " ".join(['{:.4f}'.format(beta[i]).center(10) for i in range(X.shape[1])]))
结果
2^n-1 combinations are:
[(0,), (1,), (2,), (0, 1), (0, 2), (1, 2), (0, 1, 2)]
Regression No column 0 column 1 column 2
__________________________________________________
1 -0.4152 nan nan
2 nan 0.1421 nan
3 nan nan -0.6485
4 -0.3521 0.1161 nan
5 -0.2455 nan -0.5234
6 nan 0.0564 -0.5462
7 -0.2486 0.0585 -0.4156