运行每个变量之间的单变量回归 python

Question

我有一个包含 49 列的 daraframe，我想看看列之间是否存在某种关系，即每列之间是否存在运行简单线性回归。预期输出应为列和行同名并由回归系数填充的矩阵。

例如df:

bar foo too ten
1   2   3   4
4   5   6   5
7   8   9   6

输出：

     bar             foo             too              ten
bar  r_coef(bar,bar) r_coef(bar,foo) r_coef(bar,too)  r_coef(bar,ten)
foo  r_coef(foo,bar) r_coef(foo,foo) r_coef(foo,too)  r_coef(foo,ten)
too  r_coef(too,bar) r_coef(too,foo) r_coef(too,too)  r_coef(too,ten)
ten  r_coef(ten,bar) r_coef(ten,foo) r_coef(ten,too)  r_coef(ten,ten)

Answer 1

看起来你只是想使用 corr:

df.corr()

输出：

     bar  foo  too  ten
bar  1.0  1.0  1.0  1.0
foo  1.0  1.0  1.0  1.0
too  1.0  1.0  1.0  1.0
ten  1.0  1.0  1.0  1.0

不那么模棱两可的例子：

np.random.seed(0)
df = pd.DataFrame(np.random.random(size=(4,4)),
                  columns=['bar', 'foo', 'too', 'ten'])

df.corr()
          bar       foo       too       ten
bar  1.000000 -0.701808  0.595832 -0.211943
foo -0.701808  1.000000 -0.911949 -0.547439
too  0.595832 -0.911949  1.000000  0.551369
ten -0.211943 -0.547439  0.551369  1.000000

Answer 2

IIUC，可以用np.polyfit。您有一次多项式 (y = mx + b)，因此将次数设置为 1 并且您想要获得截距值 (b)。

正如@mozway 建议的那样，使用 corr 但使用自定义方法：

# [1] is the intercept value, [0] is the slope
r_coef = lambda x, y: np.polyfit(x, y, deg=1)[1]

out = df.corr(method=r_coef)
print(out)

# Output
          bar       foo  too       ten
bar  1.000000  1.000000  2.0  3.666667
foo  1.000000  1.000000  1.0  3.333333
too  2.000000  1.000000  1.0  3.000000
ten  3.666667  3.333333  3.0  1.000000

运行每个变量之间的单变量回归 python

Run univariate regression between each variable python

python

regression

dataframe

pandas

不那么模棱两可的例子：

运行 每个变量之间的单变量回归 python

Run univariate regression between each variable python

python

regression

dataframe

pandas

不那么模棱两可的例子：

运行每个变量之间的单变量回归 python