用 least_squares() 拟合一组具有不同模型的 sigmoids?
fitting a group of sigmoids with different models with least_squares()?
对于在不同时间点测量的一系列实验,我尝试使用信息标准(如 Akaike IC)比较所有实验的固定参数拟合与每个实验具有一组单独参数的拟合. scipy.optimize.least_squares
可以吗?
假设测量值大致遵循 S 形曲线,参数为 base
(步前水平),height
(步后水平增加),stretch
(持续时间步骤)和t50
(级别为base + height/2
的时间)。我在下面的代码中对这些进行建模,其中 returns 大小 [t, c]
numpy ndarray measurements
具有 t
时间点(实验)和每个时间点的 c
值:
import numpy as np;
import matplotlib.pyplot as plt;
# lifted, scaled, stretched and shifted sigmoid
def lssig ( t, base, height, stretch, t50 ):
return base + height / ( 1 + np.exp ( -1/stretch * ( t - t50 ) ) );
# number of curves per experiment
numcurves = 10;
# number of experiments (time pts)
num_experiments = 20;
# timepoints (with simulated variability in timing)
time_jitter = 2;
tstart = np.linspace ( -10, 30, num = num_experiments );
tstart += time_jitter * np.random.rand ( num_experiments );
# measurements + added noise (this is added noise, not the variability in the estimates)
measurements = np.ndarray ( shape = ( num_experiments, numcurves ) );
measure_noise = np.random.normal ( 0, .2, ( num_experiments, numcurves ) );
# Variability of model parameters between curves:
# par_spreading [0] -> variability in base (start value)
# par_spreading [1] -> variability in height (end - start)
# par_spreading [2] -> variability in stretch (duration of step)
# par_spreading [3] -> variability in t50 (middle of step)
par_spreading = np.ones ( 4 );
# parameters of the model:
# every curve can have its own parameters, variability between curves
# is given by the par_spreading value (see above) for that parameter
height = 8 * np.ones ( numcurves ) + par_spreading [0] * np.random.rand ( numcurves);
base = 2 * np.ones ( numcurves ) + par_spreading [1] * np.random.rand ( numcurves);
stretch = 4 * np.ones ( numcurves ) + par_spreading [2] * np.random.rand ( numcurves);
t50 = 9 * np.ones ( numcurves ) + par_spreading [2] * np.random.rand ( numcurves);
# fill the measurement array
for t in range(num_experiments):
for c in range (numcurves):
measurements[ t, c ] = lssig ( tstart [t],
base [c],
height [c],
stretch [c],
t50 [c],
);
measurements += measure_noise;
# plot curves per subject
plt.plot ( tstart, measurements );
plt.show ();
有一个很不错的page,里面有使用的例子least_squares,其中拟合了1个sigmoid,sigmoid的参数在数组theta
中传递。使用该配方,我可以在每个时间点拟合第一个实验的曲线 measurements[:,0]
:
from scipy.optimize import least_squares;
# point on the sigmoid, using array theta for parameters
def y ( theta, t ):
return theta[0] + theta[1] / ( 1 + np.exp ( -1/theta[2] * ( t - theta[3] ) ) );
# objective function for least_squares: difference between noisy data & model
def fun(theta):
return y(theta, tstart) - measurements[:,0];
# start with values for base, step, stretch and t50
theta0 = [8,2,4,9];
res1 = least_squares(fun, theta0);
# show the parameters & plot the functions
print ( 'real: {}, estimated: {}'.format ( [ base[0], height[0], stretch[0], t50[0] ], res1.x ) );
#
plt.plot ( tstart, measurements[:,0], tstart, y(res1.x,tstart) );
plt.legend ( ('noisy', 'lsq' ) );
plt.show ( );
我在上面的代码中使用 4 个单独参数的原因是 base
可以是单个标量(每个实验的所有曲线都相同)或数组(每条曲线都有一个单独的 base
值)。 least_squares 可以这样使用吗?还是所有的参数都需要放在一维数组中?
可以展平参数数组(如 theta
),但这会变得异常混乱,因为在每个实验都有自己的 base
参数的模型中,第二个参数将是base
的一个值,但在所有实验都有一个 base
参数的模型中,数组中的第二个值将是 height
的值,依此类推。
理想情况下,参数仍然是分开的,因此它们的长度可以独立变化。
是否有一种管理上可追溯/美观的方式来做到这一点?
也许您可以通过提供一个最小 可重现示例来大大缩短问题的长度并提高可读性! ;)
如果我理解你的问题:
为了避免处理“混乱”的参数集,我经常创建两个小函数,例如 param2minimizer
及其相反的 minimizer2param
。这些函数负责组织最小化器的所有参数。这样代码就更容易理解了。
例如,假设您要优化任意数量的 [base,height]
对
def param2minimizer(bases,heights):
return np.concatenate((bases,heights))
def minimizer2param(m):
L = len(m)
if L%2: raise ValueError("Incompatible length")
bases = m[0:L//2]
heights = m[L//2:]
return bases, heights
您可以修改这两个函数以完全符合您的需要。
关于最小化变量维数的问题,你可以看看documentation of least_squares。我认为系统地给出向量 (1D) 而不是矩阵 (2D) 是一个更好的习惯,即使特定的最小化器是 2D 许可的。一些最小化器仅是一维许可的,而且优化算法在理论上通常在一维变量向量上进行数值描述。使用1D更安全,更符合用法
对于在不同时间点测量的一系列实验,我尝试使用信息标准(如 Akaike IC)比较所有实验的固定参数拟合与每个实验具有一组单独参数的拟合. scipy.optimize.least_squares
可以吗?
假设测量值大致遵循 S 形曲线,参数为 base
(步前水平),height
(步后水平增加),stretch
(持续时间步骤)和t50
(级别为base + height/2
的时间)。我在下面的代码中对这些进行建模,其中 returns 大小 [t, c]
numpy ndarray measurements
具有 t
时间点(实验)和每个时间点的 c
值:
import numpy as np;
import matplotlib.pyplot as plt;
# lifted, scaled, stretched and shifted sigmoid
def lssig ( t, base, height, stretch, t50 ):
return base + height / ( 1 + np.exp ( -1/stretch * ( t - t50 ) ) );
# number of curves per experiment
numcurves = 10;
# number of experiments (time pts)
num_experiments = 20;
# timepoints (with simulated variability in timing)
time_jitter = 2;
tstart = np.linspace ( -10, 30, num = num_experiments );
tstart += time_jitter * np.random.rand ( num_experiments );
# measurements + added noise (this is added noise, not the variability in the estimates)
measurements = np.ndarray ( shape = ( num_experiments, numcurves ) );
measure_noise = np.random.normal ( 0, .2, ( num_experiments, numcurves ) );
# Variability of model parameters between curves:
# par_spreading [0] -> variability in base (start value)
# par_spreading [1] -> variability in height (end - start)
# par_spreading [2] -> variability in stretch (duration of step)
# par_spreading [3] -> variability in t50 (middle of step)
par_spreading = np.ones ( 4 );
# parameters of the model:
# every curve can have its own parameters, variability between curves
# is given by the par_spreading value (see above) for that parameter
height = 8 * np.ones ( numcurves ) + par_spreading [0] * np.random.rand ( numcurves);
base = 2 * np.ones ( numcurves ) + par_spreading [1] * np.random.rand ( numcurves);
stretch = 4 * np.ones ( numcurves ) + par_spreading [2] * np.random.rand ( numcurves);
t50 = 9 * np.ones ( numcurves ) + par_spreading [2] * np.random.rand ( numcurves);
# fill the measurement array
for t in range(num_experiments):
for c in range (numcurves):
measurements[ t, c ] = lssig ( tstart [t],
base [c],
height [c],
stretch [c],
t50 [c],
);
measurements += measure_noise;
# plot curves per subject
plt.plot ( tstart, measurements );
plt.show ();
有一个很不错的page,里面有使用的例子least_squares,其中拟合了1个sigmoid,sigmoid的参数在数组theta
中传递。使用该配方,我可以在每个时间点拟合第一个实验的曲线 measurements[:,0]
:
from scipy.optimize import least_squares;
# point on the sigmoid, using array theta for parameters
def y ( theta, t ):
return theta[0] + theta[1] / ( 1 + np.exp ( -1/theta[2] * ( t - theta[3] ) ) );
# objective function for least_squares: difference between noisy data & model
def fun(theta):
return y(theta, tstart) - measurements[:,0];
# start with values for base, step, stretch and t50
theta0 = [8,2,4,9];
res1 = least_squares(fun, theta0);
# show the parameters & plot the functions
print ( 'real: {}, estimated: {}'.format ( [ base[0], height[0], stretch[0], t50[0] ], res1.x ) );
#
plt.plot ( tstart, measurements[:,0], tstart, y(res1.x,tstart) );
plt.legend ( ('noisy', 'lsq' ) );
plt.show ( );
我在上面的代码中使用 4 个单独参数的原因是 base
可以是单个标量(每个实验的所有曲线都相同)或数组(每条曲线都有一个单独的 base
值)。 least_squares 可以这样使用吗?还是所有的参数都需要放在一维数组中?
可以展平参数数组(如 theta
),但这会变得异常混乱,因为在每个实验都有自己的 base
参数的模型中,第二个参数将是base
的一个值,但在所有实验都有一个 base
参数的模型中,数组中的第二个值将是 height
的值,依此类推。
理想情况下,参数仍然是分开的,因此它们的长度可以独立变化。 是否有一种管理上可追溯/美观的方式来做到这一点?
也许您可以通过提供一个最小 可重现示例来大大缩短问题的长度并提高可读性! ;)
如果我理解你的问题:
为了避免处理“混乱”的参数集,我经常创建两个小函数,例如 param2minimizer
及其相反的 minimizer2param
。这些函数负责组织最小化器的所有参数。这样代码就更容易理解了。
例如,假设您要优化任意数量的 [base,height]
对
def param2minimizer(bases,heights):
return np.concatenate((bases,heights))
def minimizer2param(m):
L = len(m)
if L%2: raise ValueError("Incompatible length")
bases = m[0:L//2]
heights = m[L//2:]
return bases, heights
您可以修改这两个函数以完全符合您的需要。
关于最小化变量维数的问题,你可以看看documentation of least_squares。我认为系统地给出向量 (1D) 而不是矩阵 (2D) 是一个更好的习惯,即使特定的最小化器是 2D 许可的。一些最小化器仅是一维许可的,而且优化算法在理论上通常在一维变量向量上进行数值描述。使用1D更安全,更符合用法