我可以对 sklearn 进行对数回归吗？

Question

我不知道 "logarithmic regression" 是否正确，我需要在我的数据上拟合一条曲线，就像一条多项式曲线，但最后是平坦的。

这是一张图片，蓝色曲线是我的（二阶多项式回归），洋红色曲线是我需要的。

我搜索了很多都没有找到，只有线性回归，多项式回归，但没有在sklearn上进行对数回归。我需要绘制曲线，然后使用该回归进行预测。

编辑

这是我发布的情节图像的数据：

Answer 1

您正在查看 exponentially distributed 数据。

您可以通过对数转换 y 变量，然后使用线性回归。这是可行的，因为 y 的大值比较小的值压缩得更多。

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import expon

x = np.linspace(1, 10, 10)
y = np.array([30, 20, 12, 8, 7, 4, 3, 2, 2, 1])
y_fit = expon.pdf(x, scale=2)*100

fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(x, y)
ax.plot(x, y_fit)
ax.set_ylabel('y (blue)')
ax.grid(True)

ax2 = ax.twinx()
ax2.scatter(x, np.log(y), color='red')
ax2.set_ylabel('log(y) (red)')

plt.show()

Answer 2

如果我没理解错的话，你想用像 y = a * exp(-b * (x - c)) + d 这样的函数来拟合数据。

我不确定 sklearn 是否可以做到。但是您可以使用 scipy.optimize.curve_fit() 来使用您定义的任何函数来拟合数据。(scipy):

对于您的情况，我对您的数据进行了试验，结果如下：

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

my_data = np.genfromtxt('yourdata.csv', delimiter=',')
my_data = my_data[my_data[:,0].argsort()]
xdata = my_data[:,0].transpose()
ydata = my_data[:,1].transpose()

# define a function for fitting
def func(x, a, b, c, d):
    return a * np.exp(-b * (x - c)) + d

init_vals = [50, 0, 90, 63]
# fit your data and getting fit parameters
popt, pcov = curve_fit(func, xdata, ydata, p0=init_vals, bounds=([0, 0, 90, 0], [1000, 0.1, 200, 200]))
# predict new data based on your fit
y_pred = func(200, *popt)
print(y_pred)

plt.plot(xdata, ydata, 'bo', label='data')
plt.plot(xdata, func(xdata, *popt), '-', label='fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()
plt.show()

我发现 b 的初始值对于拟合至关重要。我为它估计了一个小范围，然后拟合数据。

如果你对x和y之间的关系没有先验知识，你可以使用sklearn提供的回归方法，如线性回归、核岭回归（KRR）、最近邻法回归、高斯过程回归等来拟合非线性数据。 Find the documentation here

Answer 3

要使用 sklearn，您可以先将案例 y = Aexp(-BX) 重塑为 ln(Y) = ln(A) - BX，然后使用 LinearRegressor 训练和拟合数据。

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

### Read Data
df = pd.read_csv('data.csv')

### Prepare X, Y & ln(Y)
X = df.sort_values(by=['x']).loc[:, 'x':'x']
Y = df.sort_values(by=['x']).loc[:, 'y':'y']
ln_Y = np.log(Y)

### Use the relation ln(Y) = ln(A) - BX to fit X to ln(Y)
from sklearn.linear_model import LinearRegression
exp_reg = LinearRegression()
exp_reg.fit(X, ln_Y)
#### You can introduce weights as well to apply more bias to the smaller X values, 
#### I am transforming X arbitrarily to apply higher arbitrary weights to smaller X values
exp_reg_weighted = LinearRegression()
exp_reg_weighted.fit(X, ln_Y, sample_weight=np.array(1/((X - 100).values**2)).reshape(-1))

### Get predicted values of Y
Y_pred = np.exp(exp_reg.predict(X))
Y_pred_weighted = np.exp(exp_reg_weighted.predict(X))

### Plot
plt.scatter(X, Y)
plt.plot(X, Y_pred, label='Default')
plt.plot(X, Y_pred_weighted, label='Weighted')
plt.xlabel('X')
plt.ylabel('Y')
plt.legend()

plt.show()

我可以对 sklearn 进行对数回归吗？

Can I make a logarithmic regression on sklearn?

python

regression

scikit-learn