使用Matlab通过全局拟合找出基础数据集的相对贡献?
find out the relative contribution of base data set through global fitting using Matlab?
我从我的实验中获得了2条数据曲线(y1和y2)。两条曲线具有相同的 x。
对于每条曲线,它们应该由它们自己的基础数据集来描述。
y1 = a* bd1_y1 + b* bd1_y2 + (1-a-b)* bd1_y3
y2 = a* bd2_y1 + b* bd2_y2 + (1-a-b)* bd2_y3
,其中
bd1_y1
、bd1_y2
和bd1_y3
是y1
的基础集数据
bd2_y1
、bd2_y2
和 bd2_y3
是 y2
的基础数据集
a
和b
是基础数据集的相对贡献(0<a,b<1
),是共享的。
我已经尝试过这里发布的方法:https://www.mathworks.com/matlabcentral/answers/1056-nonlinear-fit-to-multiple-data-sets-with-shared-parameters#comment_542407
但是经过几天的努力,我仍然不知道如何将基集合并到函数(f0)中,因此无法提取 a 和 b。
如何拟合 y1
和 y2
以获得 a、b 及其置信区间?非常感谢您的帮助。
Data:
x = [1 2 3 4 5 6 7 8 9 ];
y1 = [0.4304 0.2249 0.1283 0.0794 0.0484 0.0326 0.0203 0.0125 0.0072];
y2 = [0.2179 0.1699 0.1410 0.1101 0.0871 0.0679 0.0515 0.0385 0.0296];
bd1_y1 = [0.5587 0.2244 0.1023 0.0520 0.0276 0.0155 0.0089 0.0053 0.0033];
bd1_y2 = [0.4580 0.2788 0.1198 0.0642 0.0342 0.0197 0.0115 0.0069 0.0043];
bd1_y3 = [0.3584 0.3102 0.1540 0.0755 0.0440 0.0248 0.0148 0.0091 0.0056];
bd2_y1 = [0.3266 0.1778 0.1255 0.0975 0.0777 0.0612 0.0478 0.0367 0.0281];
bd2_y2 = [0.2985 0.2086 0.1268 0.0939 0.0722 0.0580 0.0470 0.0383 0.0313];
bd2_y3 = [0.2451 0.2221 0.1434 0.0999 0.0775 0.0609 0.0494 0.0406 0.0335];
所以你想解决(小心我已经更改了你的变量的名称):
b = x(1)*v1 + x(2)*v2 + (1-x(1)-x(2))*v3 %(1)
相当于
b = x(1)*(v1-v3) + x(2)*(v2-v3) + x(3)*v3 %(2)
我们有一个标准的线性代数问题 A*x = b
但有约束 x(1),x(2) ∈ [0,1]
和 x(3) = 1
这种情况可以使用lsqlin
,最小二乘线性求解器:
% We take your first example
b = [0.4304 0.2249 0.1283 0.0794 0.0484 0.0326 0.0203 0.0125 0.0072].';
% Our 3 variables (v1 = bd1_y1, v2 = bd1_y2...)
v1 = [0.5587 0.2244 0.1023 0.0520 0.0276 0.0155 0.0089 0.0053 0.0033].';
v2 = [0.4580 0.2788 0.1198 0.0642 0.0342 0.0197 0.0115 0.0069 0.0043].';
v3 = [0.3584 0.3102 0.1540 0.0755 0.0440 0.0248 0.0148 0.0091 0.0056].';
% According to (2) we can write our system of equation:
A = [v1-v3,v2-v3,v3];
% Now we fixe the lower bound and the upper bound for each parameters x: x(1), x(2), x(3):
% x(1) x(2) x(3)
lb = [0 0 1].';% 0 0 1
% to to to
lu = [1 1 1].';% 1 1 0
% And finally we solve the system of equation:
x = lsqlin(A,b.',[],[],[],[],lb,lu)
% And we found a minimum for:
% x(1) = 0.44 = a
% x(2) = 0 = b
% x(3) = 1
如果我们绘制我们获得的结果:
plot(1:9,b,1:9,A*x)
legend('Experimental','Estimation')
通常,我尽量避免约束。所以第一步应该是:检查真正的解决方案是否在约束范围内。我将在 Python 中进行编程,但它非常通用,因此将其转换为 Matlab 应该很容易。
import numpy as np
xl = np.arange(9)+1
y1 = np.array([
0.4304, 0.2249, 0.1283, 0.0794, 0.0484, 0.0326, 0.0203, 0.0125, 0.0072
])
y2 = np.array([
0.2179, 0.1699, 0.1410, 0.1101, 0.0871, 0.0679, 0.0515, 0.0385, 0.0296
])
bd1_y1 = np.array([
0.5587, 0.2244, 0.1023, 0.0520, 0.0276, 0.0155, 0.0089, 0.0053, 0.0033
])
bd1_y2 = np.array([
0.4580, 0.2788, 0.1198, 0.0642, 0.0342, 0.0197, 0.0115, 0.0069, 0.0043
])
bd1_y3 = np.array([
0.3584, 0.3102, 0.1540, 0.0755, 0.0440, 0.0248, 0.0148, 0.0091, 0.0056
])
bd2_y1 = np.array([
0.3266, 0.1778, 0.1255, 0.0975, 0.0777, 0.0612, 0.0478, 0.0367, 0.0281
])
bd2_y2 = np.array([
0.2985, 0.2086, 0.1268, 0.0939, 0.0722, 0.0580, 0.0470, 0.0383, 0.0313
])
bd2_y3 = np.array([
0.2451, 0.2221, 0.1434, 0.0999, 0.0775, 0.0609, 0.0494, 0.0406, 0.0335
])
"""
if y = a x1 + b x2 + (1- a - b ) x3 then
y - x3 = a ( x1 - x3 ) + b ( x2 -x3 )
i.e. simple transformation to get a homogeneous linear system
"""
y1k = y1 - bd1_y3
y11k = bd1_y1 - bd1_y3
y12k = bd1_y2 - bd1_y3
y2k = y2 - bd2_y3
y21k = bd2_y1 - bd2_y3
y22k = bd2_y2 - bd2_y3
"""
So we just want to solve a simple linear system
y = a x1 + b x2
with y = (y1,...., yn ) and x1 = ( x11, x12, ..., x1n) etc
handling the two measurements with shared parameters is done by just
adding the according vectors so yp and yq go into
y_tot = ( yp1, ..., ypn, yq1, ...yqn ) and same for the x
The x for the matrix A such we can write the thing as y = A.(a,b)
This is standard linear optimization!
this is done in short in the following (first joining the data sets)
"""
y = np.append( y1k, y2k )
z1 = np.append( y11k, y21k )
z2 = np.append( y12k, y22k )
AT = np.array( [ z1, z2 ] )
A = np.transpose( AT )
U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )
v = np.dot( K, y )
a = v[0]
b = v[1]
print( a, b )
"""
in detail this should be done more carefully to avoid singular matrices etc,
e.g. via single value decomposition etc
in a next step it is usually assumed that all measurements share the
same base error s which we calculate
"""
diff = y - a * z1 - b * z2
s2 = np.sum( diff**2 ) / ( len( diff ) - 1 )
"""
The covariance matrix now is calculated via error propagation,
namely like s_a = s * sqrt( sum_i ( da / dyi )^2 ), which gives the classical
s^2 * K KT
"""
cov = s2 * np.dot( K, np.transpose( K ) )
print( cov )
"""
the error of a is (and be accordingly)
"""
sa = np.sqrt( cov[0,0] )
sb = np.sqrt( cov[1,1] )
print( sa, sb )
这段代码已经连接了两个数据集,可以很容易地扩展到n个数据集。最终解决方案如下所示:
上图:原始数据。目标是蓝色和三个基地
数据集分别为黄色、绿色和红色。下面的图是蓝色目标和黄色最适合没有约束
有了解决办法
>> 1.9712440231598143 -3.163257723318949
所以这不在界限之内。实际上,a > 1
和 b < 0
。所以我们知道解决方案位于边界上。在(a,b)
-space中,要么在右边,要么在底部。真正线性问题中的最小二乘最小化给出真正的抛物线形状,因此只有一个解,没有局部最小值。
现在我们可以在某种程度上强制解决边界上的问题,而不是尝试解决约束问题。
"""
We just set on parameter to the according boundary value
and just solve as before. Here a matrix formalism is a small
overkill, but for consistency a good way of writing it.
"""
"""
case 1: b = 0
"""
AT = np.array( [ b1all ] )
A = np.transpose( AT )
U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )
v = np.dot( K, yall )
aopt1 = v[0]
diff = yall - aopt1 * b1all - 0 * b2all
s2 = np.sum( diff**2 ) / ( len( diff ) - 1 )
cov = s2 *np.dot( K, np.transpose( K ) )
print( aopt1 )
print( cov, np.sqrt(cov[0,0]) )
"""
case 2: a=1
"""
ymod = yall - b1all
AT = np.array( [ b2all ] )
A = np.transpose( AT )
U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )
v = np.dot( K, ymod )
bopt1 = v[0]
diff = ymod - bopt1 * b2all
s2 = np.sum( diff**2 ) / ( len( diff ) - 1 )
cov = s2 *np.dot( K, np.transpose( K ) )
print( bopt1 )
print( cov, np.sqrt(cov[0,0]) )
a
>> 0.3815249997379142
>> [[0.00807797]] 0.08987750010601071
和 b
>> -1.2997646861507886
>> [[0.01736744]] 0.13178558294218914
所以在 a=1
线上,b
的最佳值再次不在约束范围内,而 b = 0
有 a
的最小值。
从图形角度来看,这看起来像
卡方曲面作为密度图。边界以红色显示。通过椭圆轮廓线可以看到真正的最小值。 a=1
和 b=0
上的局部最小值显示为蓝点
在这里可以看到卡方曲面的抛物线最小值。如果这必须应用于许多不同的数据集,那么自动化决策应该很容易。计算速度很快,因为一切都是线性的,可以一次性计算出来。如果真正的解决方案在边界之外,并且有一个边界具有局部最小值,则会有第二个,但这是在另一侧,因此不相关。最后,对于每个边界,最小值不满足约束条件。然后解决方案是相应的角落。
在某种程度上,这个简单的案例允许手动执行步骤,否则将执行不等式约束算法(适用于更复杂的情况)。
最终,约束条件下的最佳拟合如下所示:
如上但有限制
我从我的实验中获得了2条数据曲线(y1和y2)。两条曲线具有相同的 x。
对于每条曲线,它们应该由它们自己的基础数据集来描述。
y1 = a* bd1_y1 + b* bd1_y2 + (1-a-b)* bd1_y3
y2 = a* bd2_y1 + b* bd2_y2 + (1-a-b)* bd2_y3
,其中
bd1_y1
、bd1_y2
和bd1_y3
是y1
的基础集数据
bd2_y1
、bd2_y2
和 bd2_y3
是 y2
的基础数据集
a
和b
是基础数据集的相对贡献(0<a,b<1
),是共享的。
我已经尝试过这里发布的方法:https://www.mathworks.com/matlabcentral/answers/1056-nonlinear-fit-to-multiple-data-sets-with-shared-parameters#comment_542407
但是经过几天的努力,我仍然不知道如何将基集合并到函数(f0)中,因此无法提取 a 和 b。
如何拟合 y1
和 y2
以获得 a、b 及其置信区间?非常感谢您的帮助。
Data:
x = [1 2 3 4 5 6 7 8 9 ];
y1 = [0.4304 0.2249 0.1283 0.0794 0.0484 0.0326 0.0203 0.0125 0.0072];
y2 = [0.2179 0.1699 0.1410 0.1101 0.0871 0.0679 0.0515 0.0385 0.0296];
bd1_y1 = [0.5587 0.2244 0.1023 0.0520 0.0276 0.0155 0.0089 0.0053 0.0033];
bd1_y2 = [0.4580 0.2788 0.1198 0.0642 0.0342 0.0197 0.0115 0.0069 0.0043];
bd1_y3 = [0.3584 0.3102 0.1540 0.0755 0.0440 0.0248 0.0148 0.0091 0.0056];
bd2_y1 = [0.3266 0.1778 0.1255 0.0975 0.0777 0.0612 0.0478 0.0367 0.0281];
bd2_y2 = [0.2985 0.2086 0.1268 0.0939 0.0722 0.0580 0.0470 0.0383 0.0313];
bd2_y3 = [0.2451 0.2221 0.1434 0.0999 0.0775 0.0609 0.0494 0.0406 0.0335];
所以你想解决(小心我已经更改了你的变量的名称):
b = x(1)*v1 + x(2)*v2 + (1-x(1)-x(2))*v3 %(1)
相当于
b = x(1)*(v1-v3) + x(2)*(v2-v3) + x(3)*v3 %(2)
我们有一个标准的线性代数问题 A*x = b
但有约束 x(1),x(2) ∈ [0,1]
和 x(3) = 1
这种情况可以使用lsqlin
,最小二乘线性求解器:
% We take your first example
b = [0.4304 0.2249 0.1283 0.0794 0.0484 0.0326 0.0203 0.0125 0.0072].';
% Our 3 variables (v1 = bd1_y1, v2 = bd1_y2...)
v1 = [0.5587 0.2244 0.1023 0.0520 0.0276 0.0155 0.0089 0.0053 0.0033].';
v2 = [0.4580 0.2788 0.1198 0.0642 0.0342 0.0197 0.0115 0.0069 0.0043].';
v3 = [0.3584 0.3102 0.1540 0.0755 0.0440 0.0248 0.0148 0.0091 0.0056].';
% According to (2) we can write our system of equation:
A = [v1-v3,v2-v3,v3];
% Now we fixe the lower bound and the upper bound for each parameters x: x(1), x(2), x(3):
% x(1) x(2) x(3)
lb = [0 0 1].';% 0 0 1
% to to to
lu = [1 1 1].';% 1 1 0
% And finally we solve the system of equation:
x = lsqlin(A,b.',[],[],[],[],lb,lu)
% And we found a minimum for:
% x(1) = 0.44 = a
% x(2) = 0 = b
% x(3) = 1
如果我们绘制我们获得的结果:
plot(1:9,b,1:9,A*x)
legend('Experimental','Estimation')
通常,我尽量避免约束。所以第一步应该是:检查真正的解决方案是否在约束范围内。我将在 Python 中进行编程,但它非常通用,因此将其转换为 Matlab 应该很容易。
import numpy as np
xl = np.arange(9)+1
y1 = np.array([
0.4304, 0.2249, 0.1283, 0.0794, 0.0484, 0.0326, 0.0203, 0.0125, 0.0072
])
y2 = np.array([
0.2179, 0.1699, 0.1410, 0.1101, 0.0871, 0.0679, 0.0515, 0.0385, 0.0296
])
bd1_y1 = np.array([
0.5587, 0.2244, 0.1023, 0.0520, 0.0276, 0.0155, 0.0089, 0.0053, 0.0033
])
bd1_y2 = np.array([
0.4580, 0.2788, 0.1198, 0.0642, 0.0342, 0.0197, 0.0115, 0.0069, 0.0043
])
bd1_y3 = np.array([
0.3584, 0.3102, 0.1540, 0.0755, 0.0440, 0.0248, 0.0148, 0.0091, 0.0056
])
bd2_y1 = np.array([
0.3266, 0.1778, 0.1255, 0.0975, 0.0777, 0.0612, 0.0478, 0.0367, 0.0281
])
bd2_y2 = np.array([
0.2985, 0.2086, 0.1268, 0.0939, 0.0722, 0.0580, 0.0470, 0.0383, 0.0313
])
bd2_y3 = np.array([
0.2451, 0.2221, 0.1434, 0.0999, 0.0775, 0.0609, 0.0494, 0.0406, 0.0335
])
"""
if y = a x1 + b x2 + (1- a - b ) x3 then
y - x3 = a ( x1 - x3 ) + b ( x2 -x3 )
i.e. simple transformation to get a homogeneous linear system
"""
y1k = y1 - bd1_y3
y11k = bd1_y1 - bd1_y3
y12k = bd1_y2 - bd1_y3
y2k = y2 - bd2_y3
y21k = bd2_y1 - bd2_y3
y22k = bd2_y2 - bd2_y3
"""
So we just want to solve a simple linear system
y = a x1 + b x2
with y = (y1,...., yn ) and x1 = ( x11, x12, ..., x1n) etc
handling the two measurements with shared parameters is done by just
adding the according vectors so yp and yq go into
y_tot = ( yp1, ..., ypn, yq1, ...yqn ) and same for the x
The x for the matrix A such we can write the thing as y = A.(a,b)
This is standard linear optimization!
this is done in short in the following (first joining the data sets)
"""
y = np.append( y1k, y2k )
z1 = np.append( y11k, y21k )
z2 = np.append( y12k, y22k )
AT = np.array( [ z1, z2 ] )
A = np.transpose( AT )
U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )
v = np.dot( K, y )
a = v[0]
b = v[1]
print( a, b )
"""
in detail this should be done more carefully to avoid singular matrices etc,
e.g. via single value decomposition etc
in a next step it is usually assumed that all measurements share the
same base error s which we calculate
"""
diff = y - a * z1 - b * z2
s2 = np.sum( diff**2 ) / ( len( diff ) - 1 )
"""
The covariance matrix now is calculated via error propagation,
namely like s_a = s * sqrt( sum_i ( da / dyi )^2 ), which gives the classical
s^2 * K KT
"""
cov = s2 * np.dot( K, np.transpose( K ) )
print( cov )
"""
the error of a is (and be accordingly)
"""
sa = np.sqrt( cov[0,0] )
sb = np.sqrt( cov[1,1] )
print( sa, sb )
这段代码已经连接了两个数据集,可以很容易地扩展到n个数据集。最终解决方案如下所示:
上图:原始数据。目标是蓝色和三个基地 数据集分别为黄色、绿色和红色。下面的图是蓝色目标和黄色最适合没有约束
有了解决办法
>> 1.9712440231598143 -3.163257723318949
所以这不在界限之内。实际上,a > 1
和 b < 0
。所以我们知道解决方案位于边界上。在(a,b)
-space中,要么在右边,要么在底部。真正线性问题中的最小二乘最小化给出真正的抛物线形状,因此只有一个解,没有局部最小值。
现在我们可以在某种程度上强制解决边界上的问题,而不是尝试解决约束问题。
"""
We just set on parameter to the according boundary value
and just solve as before. Here a matrix formalism is a small
overkill, but for consistency a good way of writing it.
"""
"""
case 1: b = 0
"""
AT = np.array( [ b1all ] )
A = np.transpose( AT )
U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )
v = np.dot( K, yall )
aopt1 = v[0]
diff = yall - aopt1 * b1all - 0 * b2all
s2 = np.sum( diff**2 ) / ( len( diff ) - 1 )
cov = s2 *np.dot( K, np.transpose( K ) )
print( aopt1 )
print( cov, np.sqrt(cov[0,0]) )
"""
case 2: a=1
"""
ymod = yall - b1all
AT = np.array( [ b2all ] )
A = np.transpose( AT )
U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )
v = np.dot( K, ymod )
bopt1 = v[0]
diff = ymod - bopt1 * b2all
s2 = np.sum( diff**2 ) / ( len( diff ) - 1 )
cov = s2 *np.dot( K, np.transpose( K ) )
print( bopt1 )
print( cov, np.sqrt(cov[0,0]) )
a
>> 0.3815249997379142
>> [[0.00807797]] 0.08987750010601071
和 b
>> -1.2997646861507886
>> [[0.01736744]] 0.13178558294218914
所以在 a=1
线上,b
的最佳值再次不在约束范围内,而 b = 0
有 a
的最小值。
从图形角度来看,这看起来像
a=1
和 b=0
上的局部最小值显示为蓝点
在这里可以看到卡方曲面的抛物线最小值。如果这必须应用于许多不同的数据集,那么自动化决策应该很容易。计算速度很快,因为一切都是线性的,可以一次性计算出来。如果真正的解决方案在边界之外,并且有一个边界具有局部最小值,则会有第二个,但这是在另一侧,因此不相关。最后,对于每个边界,最小值不满足约束条件。然后解决方案是相应的角落。 在某种程度上,这个简单的案例允许手动执行步骤,否则将执行不等式约束算法(适用于更复杂的情况)。
最终,约束条件下的最佳拟合如下所示: