我应该为这些数据使用哪个 scipy.optimize 求解器？（在一个系列中找到一个模块）

Question

我有一个数据，一个整数序列（有重复）乘以一个未知常数 c，我需要找到它。数据也有噪声：

import pandas as pd
import numpy as np

#data
mySize=[1000,1]

#Unknown constant c to find with a solver
c=np.random.uniform(0.5,10)

#example data: df=c*[integer list]+noise
df = pd.DataFrame(c* np.random.randint(-500,500,size=mySize) +np.random.uniform(-0.8,0.5,mySize))\
                 .sort_values(by=0).reset_index(drop=True)
##Export to excel and open
#if True:
#    import os
#    myPath=os.path.join(os.environ['temp'], 'myNumbers.xlsx')
#    df.to_excel(myPath,sheet_name="Python data",engine='xlsxwriter')
#    os.startfile(myPath)
myNumbers=df[0].tolist()

#auxiliary calculation
absx_y=[ abs(x-y) for x in myNumbers for y in myNumbers]

Screenshot of data and Δdata
（红色是连续数字之间的增量（差异）。当 c*“整数”相等时，差异只是噪音，在底部绘制，作为小差异）

我的想法是，因为 c 是分隔大多数数据的 module，函数 mod(data,c)≈0

Screenshot data % c

所以，我需要最小化这个损失函数：

def Loss(trial_c):
    answer=np.sum( ((absx_y/trial_c-0.5) % 1 -0.5)**2 )
    #print("c="+str(c)+"; trial_c="+str(trial_c)+"; loss(trial_c)="+str(answer))
    return answer

将数据转换为整数的 c 值的最小值

screenshot Loss function

我的意图是使用求解器，但为了理解问题，我采用了蛮力法：如果我为 c 和他的损失生成所有可能的值：（这太慢了）

#generate all trial c values for 
trialC=np.arange (1,1000, 1)*(df[0].diff().max()/1000)

#lossOfTrialC =[Loss(xx) for xx in trialC]# <- This is horribly slow, so I use parallel calculation     
from joblib import delayed, Parallel
lossOfTrialC = Parallel(n_jobs=8)(delayed(Loss)(xx) for xx in trialC)

当我绘制它时：

import matplotlib.pyplot as plt

def PlotearXY(X,Y,Title=""):

    #plt.ion()
    fig = plt.figure()
    fig.subplots_adjust(bottom=0.2)
    ax = plt.gca()
    #ax.scatter(X,Y,marker='o',s=1)#.abs()
    ax.plot(X,Y,marker='o')#.abs()
    #ax.set_yscale('log')
    plt.title(Title) 
    #plt.draw()
    plt.show()
    plt.close()


PlotearXY(trialC,lossOfTrialC,"objective c="+str(c))

我在正确的 trialC 中得到了明确的损失函数最小值，但损失非常嘈杂，充满了局部最小值

Screenshot Losses

我在excel试过这个方法，很管用。因为 excel 使用 SLSQP，所以我尝试了 scipy SLSQP 求解器：

from scipy.optimize import minimize

#Constraints
maxC=max(exampleData)
def constraint1(trial_c):
    return trial_c-maxC

#Initial value for trialC
trial_c=[df[0].diff().max()]

#Bounds for trial_c
myBounds=[(0.0000001,df[0].diff().max())]

#inequalities for trial_c (not sure if necessary)
con1 = {'type': 'ineq', 'fun': constraint1} 
cons = ([con1])

solution = minimize(Loss,trial_c,method='SLSQP',\
                    bounds=myBounds,constraints=cons)

但它通常会失败，陷入局部最小值。

问题是，“我应该使用哪个求解器？”

documentation of scipy.minimize 有一大堆不同的求解器，但我不知道哪个更适合这个问题。

或者我的整个方法都是错误的？

Answer 1

我没有深入研究您的实施，但我有几点和建议：

在我看来，您正在 objective 中使用 mod() 函数。 mod() 函数是不连续的，优化器（尤其是像 SLSQP 这样基于梯度的优化器）可能很难找到合适的下降方向。
SLSQP是一种局部优化算法，它只能保证局部最小值。而且我好像记得Excel用的是GRG2，不是SLSQP，但这跟问题无关
您可能需要考虑 SciPy 中的全局优化算法，尤其是 SHGO 和 DualAnnealing。您还应该尝试一下 NLOpt，它实现了许多优秀的全局优化算法（DIRECT、CRS2 等...）。

我应该为这些数据使用哪个 scipy.optimize 求解器？（在一个系列中找到一个模块）

Which scipy.optimize solver should I use for this data? (finding a module in a series)

optimization

solver

minimization

python-3.x

我应该为这些数据使用哪个 scipy.optimize 求解器？ （在一个系列中找到一个模块）

Which scipy.optimize solver should I use for this data? (finding a module in a series)

optimization

solver

minimization

python-3.x

我应该为这些数据使用哪个 scipy.optimize 求解器？（在一个系列中找到一个模块）