如何通过调整权重优化皮尔逊相关系数?
How to optimise the Pearson's correlation coefficient by adjusting the weights?
我想调整权重 w
以优化 Pearson 相关系数的 r 平方。
import numpy as np
from scipy import stats
x1_raw=np.array([277, 115, 196])
x2_raw=np.array([263, 118, 191])
x3_raw=np.array([270, 114, 191])
w=np.array([w1, w2, w3])
x1=np.prod([w,x1_raw], axis=0).sum()
x2=np.prod([w,x2_raw], axis=0).sum()
x3=np.prod([w,x3_raw], axis=0).sum()
x=np.array([x1, x2, x3])
y=np.array([71.86, 71.14, 70.76])
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
r_squared = r_value**2
那么调整[w1, w2, w3]
最大化r_squared
的代码是什么?
谢谢@mathew gunther
我从print(res)
得到的结果是:
final_simplex: (array([[ 0.41998763, 2.66314965, 3.34462572],
[ 0.4199877 , 2.66314968, 3.34462654],
[ 0.41998749, 2.66314983, 3.34462649],
[ 0.41998765, 2.66314917, 3.34462607]]), array([-1., -1., -1., -1.]))
fun: -0.99999999999999822
message: 'Optimization terminated successfully.'
nfev: 130
nit: 65
status: 0
success: True
x: array([ 0.41998763, 2.66314965, 3.34462572])
我可以理解为x: array([ 0.41998763, 2.66314965, 3.34462572])
就是w
; nfev
是函数求值的次数; nit
是迭代次数
但是下面的参数是什么?
array([[ 0.41998763, 2.66314965, 3.34462572],
[ 0.4199877 , 2.66314968, 3.34462654],
[ 0.41998749, 2.66314983, 3.34462649],
[ 0.41998765, 2.66314917, 3.34462607]])
array([-1., -1., -1., -1.]))
status: 0
我敢打赌有一些封闭形式的解决方案,但如果黑客代码足够,请参见下文
(此解决方案基于 scipy.optimize 包
https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html)
(返回-1次最小化变成最大化r_squared)
import numpy as np
from scipy import stats
from scipy import optimize
import IPython
def get_linregress(*args):
#IPython.embed()
w1,w2,w3 = args[0]
x1_raw=np.array([277, 115, 196])
x2_raw=np.array([263, 118, 191])
x3_raw=np.array([270, 114, 191])
w=np.array([w1, w2, w3])
#w=np.array([1, 1, 1])
x1=np.prod([w,x1_raw], axis=0).sum()
x2=np.prod([w,x2_raw], axis=0).sum()
x3=np.prod([w,x3_raw], axis=0).sum()
x=np.array([x1, x2, x3])
y=np.array([71.86, 71.14, 70.76])
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y) r_squared = r_value**2
return -1*r_squared
res = optimize.minimize(get_linregress, [1,2,3], method='Nelder-Mead', tol=1e-6)
res.x
我想调整权重 w
以优化 Pearson 相关系数的 r 平方。
import numpy as np
from scipy import stats
x1_raw=np.array([277, 115, 196])
x2_raw=np.array([263, 118, 191])
x3_raw=np.array([270, 114, 191])
w=np.array([w1, w2, w3])
x1=np.prod([w,x1_raw], axis=0).sum()
x2=np.prod([w,x2_raw], axis=0).sum()
x3=np.prod([w,x3_raw], axis=0).sum()
x=np.array([x1, x2, x3])
y=np.array([71.86, 71.14, 70.76])
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y)
r_squared = r_value**2
那么调整[w1, w2, w3]
最大化r_squared
的代码是什么?
谢谢@mathew gunther
我从print(res)
得到的结果是:
final_simplex: (array([[ 0.41998763, 2.66314965, 3.34462572],
[ 0.4199877 , 2.66314968, 3.34462654],
[ 0.41998749, 2.66314983, 3.34462649],
[ 0.41998765, 2.66314917, 3.34462607]]), array([-1., -1., -1., -1.]))
fun: -0.99999999999999822
message: 'Optimization terminated successfully.'
nfev: 130
nit: 65
status: 0
success: True
x: array([ 0.41998763, 2.66314965, 3.34462572])
我可以理解为x: array([ 0.41998763, 2.66314965, 3.34462572])
就是w
; nfev
是函数求值的次数; nit
是迭代次数
但是下面的参数是什么?
array([[ 0.41998763, 2.66314965, 3.34462572],
[ 0.4199877 , 2.66314968, 3.34462654],
[ 0.41998749, 2.66314983, 3.34462649],
[ 0.41998765, 2.66314917, 3.34462607]])
array([-1., -1., -1., -1.]))
status: 0
我敢打赌有一些封闭形式的解决方案,但如果黑客代码足够,请参见下文
(此解决方案基于 scipy.optimize 包 https://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html)
(返回-1次最小化变成最大化r_squared)
import numpy as np
from scipy import stats
from scipy import optimize
import IPython
def get_linregress(*args):
#IPython.embed()
w1,w2,w3 = args[0]
x1_raw=np.array([277, 115, 196])
x2_raw=np.array([263, 118, 191])
x3_raw=np.array([270, 114, 191])
w=np.array([w1, w2, w3])
#w=np.array([1, 1, 1])
x1=np.prod([w,x1_raw], axis=0).sum()
x2=np.prod([w,x2_raw], axis=0).sum()
x3=np.prod([w,x3_raw], axis=0).sum()
x=np.array([x1, x2, x3])
y=np.array([71.86, 71.14, 70.76])
slope, intercept, r_value, p_value, std_err = stats.linregress(x,y) r_squared = r_value**2
return -1*r_squared
res = optimize.minimize(get_linregress, [1,2,3], method='Nelder-Mead', tol=1e-6)
res.x