减少校准中的测量次数

Question

出于校准目的，我对水流进行了 N 次测量，每次测量都非常耗时。我想减少测量次数。听起来这是功能选择的一部分，因为我正在减少我拥有的列数。但是 - 我需要预测我将放弃的测量值。

这是数据示例：

    SerialNumber  val      speed
0   193604048   1.350254    105.0
1   193604048   1.507517    3125.0
2   193604048   1.455142    525.0
6   193604048   1.211184    12.8
7   193604048   1.238835    20.0

对于每个序列号，我都有一套完整的速度值测量值。理想情况下，我想要一个模型，其输出是所有 N val 测量值的向量，但似乎选项都是神经网络，我现在正试图避免这种情况。还有其他选择吗？

如果我将此数据输入回归模型，我如何区分每个 serialNumber 数据集？

为了确保我的目标明确 - 我想了解我拥有的 N 个测量值的历史测量值，并找出我可以降低哪个速度值以仍然准确地预测所有 N 个输出值。

谢谢！

Answer 1

我试图找到最简单的方程来很好地拟合您发布的示例数据，并且从我的方程中搜索哈里斯屈服密度方程 "y = 1.0 / (a + b * pow(x, c))" 是一个很好的选择。这是一个使用该方程式和您的数据的图形 Python 拟合器，非线性拟合器的初始参数估计直接从数据最大值和最小值计算得出。请注意，SerialNumber 本身与数据无关，不会用于回归。

我希望您可能会发现这个等式在您的工作中通常有用，并且有可能在对几个不同的数据集执行类似的回归后，参数 a、b 和 c 在所有情况下都非常相似 -这是最好的结果。如果您的测量精度很高，我个人希望使用这个三参数方程每次校准应该可以使用至少四个数据点，最大、最小和其他两个间隔良好的点沿着预期的校准曲线。

注意这里拟合的参数a = -1.91719091e-03。 b = 1.11357103e+00 和 c = -1.51294798e+01 产生 RMSE = 3.191 和 R 平方 = 0.9999

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

xData = numpy.array([1.350254, 1.507517, 1.455142, 1.211184, 1.238835])
yData = numpy.array([105.0, 3125.0, 525.0, 12.8, 20.0])


def func(x, a, b, c): # Harris yield density equation
    return 1.0 / (a + b*numpy.power(x, c))


initialParameters = numpy.array([0.0, min(xData), -10.0 * max(xData)])

# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

print('Parameters:', fittedParameters)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_title('Harris Yield Density Equation') # title
    axes.set_xlabel('Val') # X axis data label
    axes.set_ylabel('Speed') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

更新使用反转的 X 和 Y

根据评论，这里是一个三参数方程混合幂和幂 "a * pow(x, b) * exp(c * x)" 图形拟合器，其中 X 和 Y 与之前的代码相反。此处拟合参数 a = 1.05910664e+00、b = 5.26304345e-02 和 -2.25604946e-05 产生 RMSE = 0.0003602 和 R 平方 = 0.9999

import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

xData = numpy.array([105.0, 3125.0, 525.0, 12.8, 20.0])
yData = numpy.array([1.350254, 1.507517, 1.455142, 1.211184, 1.238835])


def func(x, a, b, c): # mixed power and exponential equation
    return a * numpy.power(x, b) * numpy.exp(c * x)


initialParameters = [1.0, 0.01, -0.01]

# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)

modelPredictions = func(xData, *fittedParameters) 

absError = modelPredictions - yData

SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))

print('Parameters:', fittedParameters)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)

print()


##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
    f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
    axes = f.add_subplot(111)

    # first the raw data as a scatter plot
    axes.plot(xData, yData,  'D')

    # create data for the fitted equation plot
    xModel = numpy.linspace(min(xData), max(xData))
    yModel = func(xModel, *fittedParameters)

    # now the model as a line plot
    axes.plot(xModel, yModel)

    axes.set_title('Mixed Power and Exponential Equation') # title
    axes.set_xlabel('Speed') # X axis data label
    axes.set_ylabel('Val') # Y axis data label

    plt.show()
    plt.close('all') # clean up after using pyplot

graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

减少校准中的测量次数

Reduce Number of Measurements in Calibration

python

regression

feature-selection

scikit-learn

supervised-learning