分段线性拟合
Piecewise linear fit
我正在尝试自动拟合功率谱密度。一个典型的情节看起来像这样(loglog scale):
红色是我尝试安装它的一种尝试。
我想有一种方法可以将最多 3 个线性段拟合到功率谱密度图中。
我已经尝试了几种方法,例如通过查看 1 使用 curve_fit :
def logfit(x, a1, a2, b, cutoff):
cutoff = int(params[3])
out = np.empty_like(x)
out[:cutoff] = x[:cutoff]*a1 + b
out[cutoff:] = x[cutoff]*a1 + b + (x[cutoff:] - x[cutoff])*a2
return out
#the fit is only linear on loglog scale, need to apply np.log10 to x and y.
popt, pcov = curve_fit(logfit, np.log10(x), np.log10(y), bounds = ([-2,-2,-np.inf,99],[0,0,np.inf,100]))
这通常提供平底版型。
我还查看了 [2],使用了 lmfit 包:
def logfit(params, x, data):
a1, a2, b = params['a1'], params['a2'], params['b']
c = int(params['cutoff'])
out = np.empty_like(x)
out[:c] = x[:c]*a1 + b
out[c:] = x[c]*a1 + b + (x[c:] - x[c])*a2
return data - out
params = Parameters()
params.add('a1', value = -0.3, min = -2, max = 0)
params.add('a2', value = -2, min = -2, max = -0.5)
params.add('b', min = 0, max = 5)
params.add('cutoff', min = 150, max = 500)
params = minimize(logfit, params, args=(f, pxx))
但这也会产生完全偏离的拟合(如上图中的红色拟合)。
这是因为参数太多了吗?我原以为这个问题很简单,可以通过优化来解决...
1
[2]Fit a curve for data made up of two distinct regimes
包含一个能说明问题的完整示例总是有帮助的。
当然,您制定问题的方式的部分问题是 cutoff
变量不会在拟合中得到优化。 scipy.optimize
或 lmfit
可用的所有求解器都严格适用于连续变量,而不是整数或离散变量。求解器通过对变量进行小的更改(例如在 1.e-7 级别)并查看其如何改变拟合来工作。使用 c = int(params['cutoff'])
时,c
的值根本不会改变,并且拟合将决定对 cutoff
的小改动不会改变拟合。
克服这个问题的一种方法是,通常效果很好的方法是不使用离散截止,而是使用 step-like 函数——逻辑函数可能是最常见的选择——它从 0 开始一步在一个小间隔内到 1,比如 1 或 2 x
点。有了这样一个在几个x
个数据点上连续扩展的函数,cutoff
的一个小变化就会使计算模型发生微小的变化,这样拟合就能找到一个好的解。
这是一个这样做的例子。 FWIW,lmfit
包含一个 logistic
函数,但它只是一个 one-liner,所以我将使用它并以困难的方式进行操作:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
from lmfit.lineshapes import logistic
# build some fake data comprised of two line segments
x = np.linspace(0, 50, 101)
y = 37.0 - 0.62*x
y[62:] = y[62] - 5.41*(x[62:] - x[62])
y = y + np.random.normal(size=len(y), scale=0.5)
def two_lines(x, offset1, slope1, offset2, slope2, cutoff, width=1.0):
"""two line segments, joined at a cutoff point by a logistic step"""
# step = 1 - 1./(1. + np.exp((x - cutoff)/max(1.-5, width)))
step = logistic(x, center=cutoff, sigma=width)
return (offset1 + slope1*x)*(1-step) + (offset2 + slope2*x)*(step)
lmodel = Model(two_lines)
params = lmodel.make_params(offset1=1.8, slope1=0, offset2=10, slope2=-4,
cutoff=25.0, width=1.0)
# the width could be varied, or just set to ~1 `x` interval or so.
# here, we fix it at 'half a step'
params['width'].value = (x[1] - x[0]) /2.0
params['width'].vary = False
# do the fit, print fit report
result = lmodel.fit(y, params, x=x)
print(result.fit_report())
# plot the results
plt.plot(x, y, label='data')
plt.plot(x, result.best_fit, label='fit')
plt.legend()
plt.show()
这将打印出类似
的报告
[[Model]]
Model(two_lines)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 115
# data points = 101
# variables = 5
chi-square = 27.2552501
reduced chi-square = 0.28390885
Akaike info crit = -122.297309
Bayesian info crit = -109.221707
[[Variables]]
offset1: 36.8909381 +/- 0.13294218 (0.36%) (init = 1.8)
slope1: -0.62144415 +/- 0.00742987 (1.20%) (init = 0)
offset2: 185.617291 +/- 0.71277881 (0.38%) (init = 10)
slope2: -5.41427487 +/- 0.01713805 (0.32%) (init = -4)
cutoff: 31.3853560 +/- 0.22330118 (0.71%) (init = 25)
width: 0.25 (fixed)
[[Correlations]] (unreported correlations are < 0.100)
C(offset2, slope2) = -0.992
C(offset1, slope1) = -0.862
C(offset2, cutoff) = -0.338
C(slope2, cutoff) = 0.318
还有这样的情节
我希望这足以让你继续前进。
感谢@M Newville,并使用 [1] 的答案,我想出了以下可行的解决方案:
def continuousFit(x, y, weight = True):
"""
fits data using two lines, forcing continuity between the fits.
if `weights` = true, applies a weights to datapoints, to cope with unevenly distributed data.
"""
lmodel = Model(continuousLines)
params = Parameters()
#Typical values and boundaries for power-spectral densities :
params.add('a1', value = -1, min = -2, max = 0)
params.add('a2', value = -1, min = -2, max = -0.5)
params.add('b', min = 1, max = 10)
params.add('cutoff', min = x[10], max = x[-10])
if weight:
#weights are based on the density of datapoints on the x-axis
#since there are fewer points on the low-frequencies they get heavier weights.
weights = np.abs(x[1:] - x[:-1])
weights = weights/np.sum(weights)
weights = np.append(weights, weights[-1])
result = lmodel.fit(y, params, weights = weights, x=x)
return result.best_fit
else:
result = lmodel.fit(y, params, x=x)
return result.best_fit
def continuousLines(x, a1, a2, b, cutoff):
"""two line segments, joined at a cutoff point in a continuous manner"""
return a1*x + b + a2*np.maximum(0, x - cutoff)
[1] : Fit a curve for data made up of two distinct regimes
我正在尝试自动拟合功率谱密度。一个典型的情节看起来像这样(loglog scale):
红色是我尝试安装它的一种尝试。 我想有一种方法可以将最多 3 个线性段拟合到功率谱密度图中。 我已经尝试了几种方法,例如通过查看 1 使用 curve_fit :
def logfit(x, a1, a2, b, cutoff):
cutoff = int(params[3])
out = np.empty_like(x)
out[:cutoff] = x[:cutoff]*a1 + b
out[cutoff:] = x[cutoff]*a1 + b + (x[cutoff:] - x[cutoff])*a2
return out
#the fit is only linear on loglog scale, need to apply np.log10 to x and y.
popt, pcov = curve_fit(logfit, np.log10(x), np.log10(y), bounds = ([-2,-2,-np.inf,99],[0,0,np.inf,100]))
这通常提供平底版型。 我还查看了 [2],使用了 lmfit 包:
def logfit(params, x, data):
a1, a2, b = params['a1'], params['a2'], params['b']
c = int(params['cutoff'])
out = np.empty_like(x)
out[:c] = x[:c]*a1 + b
out[c:] = x[c]*a1 + b + (x[c:] - x[c])*a2
return data - out
params = Parameters()
params.add('a1', value = -0.3, min = -2, max = 0)
params.add('a2', value = -2, min = -2, max = -0.5)
params.add('b', min = 0, max = 5)
params.add('cutoff', min = 150, max = 500)
params = minimize(logfit, params, args=(f, pxx))
但这也会产生完全偏离的拟合(如上图中的红色拟合)。 这是因为参数太多了吗?我原以为这个问题很简单,可以通过优化来解决...
1
[2]Fit a curve for data made up of two distinct regimes
包含一个能说明问题的完整示例总是有帮助的。
当然,您制定问题的方式的部分问题是 cutoff
变量不会在拟合中得到优化。 scipy.optimize
或 lmfit
可用的所有求解器都严格适用于连续变量,而不是整数或离散变量。求解器通过对变量进行小的更改(例如在 1.e-7 级别)并查看其如何改变拟合来工作。使用 c = int(params['cutoff'])
时,c
的值根本不会改变,并且拟合将决定对 cutoff
的小改动不会改变拟合。
克服这个问题的一种方法是,通常效果很好的方法是不使用离散截止,而是使用 step-like 函数——逻辑函数可能是最常见的选择——它从 0 开始一步在一个小间隔内到 1,比如 1 或 2 x
点。有了这样一个在几个x
个数据点上连续扩展的函数,cutoff
的一个小变化就会使计算模型发生微小的变化,这样拟合就能找到一个好的解。
这是一个这样做的例子。 FWIW,lmfit
包含一个 logistic
函数,但它只是一个 one-liner,所以我将使用它并以困难的方式进行操作:
import numpy as np
import matplotlib.pyplot as plt
from lmfit import Model
from lmfit.lineshapes import logistic
# build some fake data comprised of two line segments
x = np.linspace(0, 50, 101)
y = 37.0 - 0.62*x
y[62:] = y[62] - 5.41*(x[62:] - x[62])
y = y + np.random.normal(size=len(y), scale=0.5)
def two_lines(x, offset1, slope1, offset2, slope2, cutoff, width=1.0):
"""two line segments, joined at a cutoff point by a logistic step"""
# step = 1 - 1./(1. + np.exp((x - cutoff)/max(1.-5, width)))
step = logistic(x, center=cutoff, sigma=width)
return (offset1 + slope1*x)*(1-step) + (offset2 + slope2*x)*(step)
lmodel = Model(two_lines)
params = lmodel.make_params(offset1=1.8, slope1=0, offset2=10, slope2=-4,
cutoff=25.0, width=1.0)
# the width could be varied, or just set to ~1 `x` interval or so.
# here, we fix it at 'half a step'
params['width'].value = (x[1] - x[0]) /2.0
params['width'].vary = False
# do the fit, print fit report
result = lmodel.fit(y, params, x=x)
print(result.fit_report())
# plot the results
plt.plot(x, y, label='data')
plt.plot(x, result.best_fit, label='fit')
plt.legend()
plt.show()
这将打印出类似
的报告[[Model]]
Model(two_lines)
[[Fit Statistics]]
# fitting method = leastsq
# function evals = 115
# data points = 101
# variables = 5
chi-square = 27.2552501
reduced chi-square = 0.28390885
Akaike info crit = -122.297309
Bayesian info crit = -109.221707
[[Variables]]
offset1: 36.8909381 +/- 0.13294218 (0.36%) (init = 1.8)
slope1: -0.62144415 +/- 0.00742987 (1.20%) (init = 0)
offset2: 185.617291 +/- 0.71277881 (0.38%) (init = 10)
slope2: -5.41427487 +/- 0.01713805 (0.32%) (init = -4)
cutoff: 31.3853560 +/- 0.22330118 (0.71%) (init = 25)
width: 0.25 (fixed)
[[Correlations]] (unreported correlations are < 0.100)
C(offset2, slope2) = -0.992
C(offset1, slope1) = -0.862
C(offset2, cutoff) = -0.338
C(slope2, cutoff) = 0.318
还有这样的情节
我希望这足以让你继续前进。
感谢@M Newville,并使用 [1] 的答案,我想出了以下可行的解决方案:
def continuousFit(x, y, weight = True):
"""
fits data using two lines, forcing continuity between the fits.
if `weights` = true, applies a weights to datapoints, to cope with unevenly distributed data.
"""
lmodel = Model(continuousLines)
params = Parameters()
#Typical values and boundaries for power-spectral densities :
params.add('a1', value = -1, min = -2, max = 0)
params.add('a2', value = -1, min = -2, max = -0.5)
params.add('b', min = 1, max = 10)
params.add('cutoff', min = x[10], max = x[-10])
if weight:
#weights are based on the density of datapoints on the x-axis
#since there are fewer points on the low-frequencies they get heavier weights.
weights = np.abs(x[1:] - x[:-1])
weights = weights/np.sum(weights)
weights = np.append(weights, weights[-1])
result = lmodel.fit(y, params, weights = weights, x=x)
return result.best_fit
else:
result = lmodel.fit(y, params, x=x)
return result.best_fit
def continuousLines(x, a1, a2, b, cutoff):
"""two line segments, joined at a cutoff point in a continuous manner"""
return a1*x + b + a2*np.maximum(0, x - cutoff)
[1] : Fit a curve for data made up of two distinct regimes