使用Matlab通过全局拟合找出基础数据集的相对贡献？

Question

我从我的实验中获得了2条数据曲线（y1和y2）。两条曲线具有相同的 x。
对于每条曲线，它们应该由它们自己的基础数据集来描述。

 y1 = a* bd1_y1 + b* bd1_y2 + (1-a-b)* bd1_y3 
 y2 = a* bd2_y1 + b* bd2_y2 + (1-a-b)* bd2_y3

，其中

bd1_y1、bd1_y2和bd1_y3是y1的基础集数据 bd2_y1、bd2_y2 和 bd2_y3 是 y2 的基础数据集 a和b是基础数据集的相对贡献（0<a,b<1），是共享的。

我已经尝试过这里发布的方法：https://www.mathworks.com/matlabcentral/answers/1056-nonlinear-fit-to-multiple-data-sets-with-shared-parameters#comment_542407 但是经过几天的努力，我仍然不知道如何将基集合并到函数（f0）中，因此无法提取 a 和 b。
如何拟合 y1 和 y2 以获得 a、b 及其置信区间？非常感谢您的帮助。

Data:
x = [1     2     3     4     5     6     7     8     9  ]; 
y1 = [0.4304    0.2249    0.1283    0.0794    0.0484    0.0326    0.0203    0.0125    0.0072]; 
y2 = [0.2179    0.1699    0.1410    0.1101    0.0871    0.0679    0.0515    0.0385    0.0296]; 
bd1_y1 = [0.5587    0.2244    0.1023    0.0520    0.0276    0.0155    0.0089    0.0053    0.0033]; 
bd1_y2 = [0.4580    0.2788    0.1198    0.0642    0.0342    0.0197    0.0115    0.0069    0.0043]; 
bd1_y3 = [0.3584    0.3102    0.1540    0.0755    0.0440    0.0248    0.0148    0.0091    0.0056]; 
bd2_y1 = [0.3266    0.1778    0.1255    0.0975    0.0777    0.0612    0.0478    0.0367    0.0281]; 
bd2_y2 = [0.2985    0.2086    0.1268    0.0939    0.0722    0.0580    0.0470    0.0383    0.0313]; 
bd2_y3 = [0.2451    0.2221    0.1434    0.0999    0.0775    0.0609    0.0494    0.0406    0.0335];

Answer 1

所以你想解决（小心我已经更改了你的变量的名称）：

b = x(1)*v1 + x(2)*v2 + (1-x(1)-x(2))*v3          %(1)

相当于

b = x(1)*(v1-v3) + x(2)*(v2-v3) + x(3)*v3         %(2)

我们有一个标准的线性代数问题 A*x = b 但有约束 x(1),x(2) ∈ [0,1] 和 x(3) = 1

这种情况可以使用lsqlin，最小二乘线性求解器：

% We take your first example
b  = [0.4304    0.2249    0.1283    0.0794    0.0484    0.0326    0.0203    0.0125    0.0072].'; 

% Our 3 variables (v1 = bd1_y1, v2 = bd1_y2...)
v1 = [0.5587    0.2244    0.1023    0.0520    0.0276    0.0155    0.0089    0.0053    0.0033].'; 
v2 = [0.4580    0.2788    0.1198    0.0642    0.0342    0.0197    0.0115    0.0069    0.0043].'; 
v3 = [0.3584    0.3102    0.1540    0.0755    0.0440    0.0248    0.0148    0.0091    0.0056].'; 

% According to (2) we can write our system of equation:
A = [v1-v3,v2-v3,v3];

% Now we fixe the lower bound and the upper bound for each parameters x: x(1), x(2), x(3):
%                x(1) x(2) x(3)
lb = [0 0 1].';%   0   0   1
%                  to  to  to
lu = [1 1 1].';%   1   1   0

% And finally we solve the system of equation:
x = lsqlin(A,b.',[],[],[],[],lb,lu)
% And we found a minimum for:
% x(1) = 0.44 = a
% x(2) = 0    = b
% x(3) = 1

如果我们绘制我们获得的结果：

plot(1:9,b,1:9,A*x)
legend('Experimental','Estimation')

Answer 2

通常，我尽量避免约束。所以第一步应该是：检查真正的解决方案是否在约束范围内。我将在 Python 中进行编程，但它非常通用，因此将其转换为 Matlab 应该很容易。

import numpy as np

xl = np.arange(9)+1

y1 = np.array([
    0.4304, 0.2249, 0.1283, 0.0794, 0.0484, 0.0326, 0.0203, 0.0125, 0.0072
]) 
y2 = np.array([
    0.2179, 0.1699, 0.1410, 0.1101, 0.0871, 0.0679, 0.0515, 0.0385, 0.0296
])
bd1_y1 = np.array([
    0.5587, 0.2244, 0.1023, 0.0520, 0.0276, 0.0155, 0.0089, 0.0053, 0.0033
])
bd1_y2 = np.array([
    0.4580, 0.2788, 0.1198, 0.0642, 0.0342, 0.0197, 0.0115, 0.0069, 0.0043
])
bd1_y3 = np.array([
    0.3584, 0.3102, 0.1540, 0.0755, 0.0440, 0.0248, 0.0148, 0.0091, 0.0056
])
bd2_y1 = np.array([
    0.3266, 0.1778, 0.1255, 0.0975, 0.0777, 0.0612, 0.0478, 0.0367, 0.0281
])
bd2_y2 = np.array([
    0.2985, 0.2086, 0.1268, 0.0939, 0.0722, 0.0580, 0.0470, 0.0383, 0.0313
])
bd2_y3 = np.array([
    0.2451, 0.2221, 0.1434, 0.0999, 0.0775, 0.0609, 0.0494, 0.0406, 0.0335
])

"""
if y = a x1 + b x2 + (1- a - b ) x3 then
y - x3 = a ( x1 - x3  ) + b ( x2 -x3 ) 
i.e. simple transformation to get a homogeneous linear system
"""

y1k = y1 - bd1_y3
y11k = bd1_y1 - bd1_y3
y12k = bd1_y2 - bd1_y3

y2k = y2 - bd2_y3
y21k = bd2_y1 - bd2_y3
y22k = bd2_y2 - bd2_y3

"""
So we just want to solve a simple linear system 
y = a x1 + b x2 
with y = (y1,...., yn ) and x1 = ( x11, x12, ..., x1n) etc
handling the two measurements with shared parameters is done by just 
adding the according vectors so yp and yq go into 
y_tot = ( yp1, ..., ypn, yq1, ...yqn ) and same for  the x
The x for the matrix A such we can write the thing as y = A.(a,b)
This is standard linear optimization!
this is done in short in the following (first joining the data sets)
"""

y = np.append( y1k, y2k )
z1 = np.append( y11k, y21k )
z2 = np.append( y12k, y22k )
AT = np.array( [ z1, z2 ] )
A = np.transpose( AT )

U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )

v = np.dot( K, y )
a = v[0]
b = v[1]
print( a, b )

"""
in detail this should be done more carefully to avoid singular matrices etc,
e.g. via single value decomposition etc

in a next step it is usually assumed that all measurements share the 
same base error s which we calculate
"""

diff = y - a * z1 - b * z2
s2 = np.sum( diff**2 ) / ( len( diff ) - 1 )

"""
The covariance matrix now is calculated via error propagation,
namely like s_a = s * sqrt( sum_i ( da / dyi )^2 ), which gives the classical

s^2 * K KT
"""

cov = s2 * np.dot( K, np.transpose( K ) )
print( cov )

"""
the error of a is (and be accordingly)
"""

sa = np.sqrt( cov[0,0] )
sb = np.sqrt( cov[1,1] )

print( sa, sb )

这段代码已经连接了两个数据集，可以很容易地扩展到n个数据集。最终解决方案如下所示：

上图：原始数据。目标是蓝色和三个基地数据集分别为黄色、绿色和红色。下面的图是蓝色目标和黄色最适合没有约束

有了解决办法

>> 1.9712440231598143 -3.163257723318949

所以这不在界限之内。实际上，a > 1 和 b < 0。所以我们知道解决方案位于边界上。在(a,b)-space中，要么在右边，要么在底部。真正线性问题中的最小二乘最小化给出真正的抛物线形状，因此只有一个解，没有局部最小值。

现在我们可以在某种程度上强制解决边界上的问题，而不是尝试解决约束问题。

"""
We just set on parameter to the according boundary value
and just solve as before. Here a matrix formalism  is a small 
overkill, but for consistency a good way of writing it.
"""

"""
case 1: b = 0
"""
AT = np.array( [ b1all ] )
A = np.transpose( AT )
U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )
v = np.dot( K, yall )
aopt1 = v[0]

diff = yall - aopt1 * b1all - 0 * b2all
s2 = np.sum( diff**2 ) / ( len( diff ) - 1 )
cov = s2 *np.dot( K, np.transpose( K ) )
print( aopt1 )
print( cov, np.sqrt(cov[0,0]) )

"""
case 2: a=1
"""
ymod = yall - b1all
AT = np.array( [ b2all ] )
A = np.transpose( AT )
U = np.dot( AT, A )
UI = np.linalg.inv( U )
K = np.dot( UI, AT )
v = np.dot( K, ymod )
bopt1 = v[0]

diff = ymod - bopt1 * b2all
s2 = np.sum( diff**2 ) / ( len( diff ) - 1 )
cov = s2 *np.dot( K, np.transpose( K ) )
print( bopt1 )
print( cov, np.sqrt(cov[0,0]) )

a

>> 0.3815249997379142
>> [[0.00807797]] 0.08987750010601071

和 b

>> -1.2997646861507886
>> [[0.01736744]] 0.13178558294218914

所以在 a=1 线上，b 的最佳值再次不在约束范围内，而 b = 0 有 a 的最小值。

从图形角度来看，这看起来像

卡方曲面作为密度图。边界以红色显示。通过椭圆轮廓线可以看到真正的最小值。 a=1 和 b=0 上的局部最小值显示为蓝点

在这里可以看到卡方曲面的抛物线最小值。如果这必须应用于许多不同的数据集，那么自动化决策应该很容易。计算速度很快，因为一切都是线性的，可以一次性计算出来。如果真正的解决方案在边界之外，并且有一个边界具有局部最小值，则会有第二个，但这是在另一侧，因此不相关。最后，对于每个边界，最小值不满足约束条件。然后解决方案是相应的角落。在某种程度上，这个简单的案例允许手动执行步骤，否则将执行不等式约束算法（适用于更复杂的情况）。

最终，约束条件下的最佳拟合如下所示： 如上但有限制

使用Matlab通过全局拟合找出基础数据集的相对贡献？

find out the relative contribution of base data set through global fitting using Matlab?

matlab

global-variables

data-fitting