建模支持向量回归 (SVR) 与线性回归
Modeling-Support Vector Regression (SVR) vs. Linear Regression
我对建模技术有点陌生,我正在尝试比较 SVR 和线性回归。我使用 f(x) = 5x+10 线性函数来生成训练和测试数据集。到目前为止,我已经编写了以下代码片段:
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
with open('test.csv', 'r') as f1:
train_dataframe = pd.read_csv(f1)
X_train = train_dataframe.iloc[:30,(0)]
y_train = train_dataframe.iloc[:30,(1)]
with open('test.csv','r') as f2:
test_dataframe = pd.read_csv(f2)
X_test = test_dataframe.iloc[30:,(0)]
y_test = test_dataframe.iloc[30:,(1)]
svr = svm.SVR(kernel="rbf", gamma=0.1)
log = LinearRegression()
svr.fit(X_train.reshape(-1,1),y_train)
log.fit(X_train.reshape(-1,1), y_train)
predSVR = svr.predict(X_test.reshape(-1,1))
predLog = log.predict(X_test.reshape(-1,1))
plt.plot(X_test, y_test, label='true data')
plt.plot(X_test, predSVR, 'co', label='SVR')
plt.plot(X_test, predLog, 'mo', label='LogReg')
plt.legend()
plt.show()
如图所示,线性回归效果很好,但SVM的预测精度很差。
如果您有任何解决此问题的建议,请告诉我。
谢谢
请看下面的代码:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.cross_validation import train_test_split
X = np.linspace(0,100,101)
y = np.array([(100*np.random.rand(1)+num) for num in (5*x+10)])
X_train, X_test, y_train, y_test = train_test_split(X, y)
svr = SVR(kernel='linear')
lm = LinearRegression()
svr.fit(X_train.reshape(-1,1),y_train.flatten())
lm.fit(X_train.reshape(-1,1), y_train.flatten())
pred_SVR = svr.predict(X_test.reshape(-1,1))
pred_lm = lm.predict(X_test.reshape(-1,1))
plt.plot(X,y, label='True data')
plt.plot(X_test[::2], pred_SVR[::2], 'co', label='SVR')
plt.plot(X_test[1::2], pred_lm[1::2], 'mo', label='Linear Reg')
plt.legend(loc='upper left');
你无处可去的原因是rbf
内核
原因是带有内核 rbf 的 SVR 没有应用特征缩放。在将数据拟合到模型之前,您需要应用特征缩放。
特征缩放的示例代码
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X = sc_X.fit_transform(X)
sc_y = StandardScaler()
y = sc_y.fit_transform(y)
如果我们用这样的约束调整 SVR rbf 模型:
svr_rbf=SVR(内核='rbf', C=1e3, gamma=0.1)
我们将看到不同的结果,如下图所示。绿星是新 SVR_rbf 模型预测。希望对你有帮助。
我对建模技术有点陌生,我正在尝试比较 SVR 和线性回归。我使用 f(x) = 5x+10 线性函数来生成训练和测试数据集。到目前为止,我已经编写了以下代码片段:
import csv
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
with open('test.csv', 'r') as f1:
train_dataframe = pd.read_csv(f1)
X_train = train_dataframe.iloc[:30,(0)]
y_train = train_dataframe.iloc[:30,(1)]
with open('test.csv','r') as f2:
test_dataframe = pd.read_csv(f2)
X_test = test_dataframe.iloc[30:,(0)]
y_test = test_dataframe.iloc[30:,(1)]
svr = svm.SVR(kernel="rbf", gamma=0.1)
log = LinearRegression()
svr.fit(X_train.reshape(-1,1),y_train)
log.fit(X_train.reshape(-1,1), y_train)
predSVR = svr.predict(X_test.reshape(-1,1))
predLog = log.predict(X_test.reshape(-1,1))
plt.plot(X_test, y_test, label='true data')
plt.plot(X_test, predSVR, 'co', label='SVR')
plt.plot(X_test, predLog, 'mo', label='LogReg')
plt.legend()
plt.show()
如图所示,线性回归效果很好,但SVM的预测精度很差。
如果您有任何解决此问题的建议,请告诉我。
谢谢
请看下面的代码:
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR
from sklearn.cross_validation import train_test_split
X = np.linspace(0,100,101)
y = np.array([(100*np.random.rand(1)+num) for num in (5*x+10)])
X_train, X_test, y_train, y_test = train_test_split(X, y)
svr = SVR(kernel='linear')
lm = LinearRegression()
svr.fit(X_train.reshape(-1,1),y_train.flatten())
lm.fit(X_train.reshape(-1,1), y_train.flatten())
pred_SVR = svr.predict(X_test.reshape(-1,1))
pred_lm = lm.predict(X_test.reshape(-1,1))
plt.plot(X,y, label='True data')
plt.plot(X_test[::2], pred_SVR[::2], 'co', label='SVR')
plt.plot(X_test[1::2], pred_lm[1::2], 'mo', label='Linear Reg')
plt.legend(loc='upper left');
你无处可去的原因是rbf
内核
原因是带有内核 rbf 的 SVR 没有应用特征缩放。在将数据拟合到模型之前,您需要应用特征缩放。
特征缩放的示例代码
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X = sc_X.fit_transform(X)
sc_y = StandardScaler()
y = sc_y.fit_transform(y)
如果我们用这样的约束调整 SVR rbf 模型:
svr_rbf=SVR(内核='rbf', C=1e3, gamma=0.1)
我们将看到不同的结果,如下图所示。绿星是新 SVR_rbf 模型预测。希望对你有帮助。