如何提高scikit-learn中预测的准确性
How to improve the accuracy of prediction in scikit-learn
我想根据 3 个特征 和 1 个目标 预测一个参数。这是我的输入文件 (data.csv):
feature.1 feature.2 feature.3 target
1 1 1 0.0625
0.5 0.5 0.5 0.125
0.25 0.25 0.25 0.25
0.125 0.125 0.125 0.5
0.0625 0.0625 0.0625 1
这是我的代码:
import pandas as pd
from sklearn.model_selection import train_test_split
from collections import *
from sklearn.linear_model import LinearRegression
features = pd.read_csv('data.csv')
features.head()
features_name = ['feature.1' , 'feature.2' , 'feature.3']
target_name = ['target']
X = features[features_name]
y = features[target_name]
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 42)
linear_regression_model = LinearRegression()
linear_regression_model.fit(X_train,y_train)
#Here is where I want to predict the target value for these inputs for 3 features
new_data = OrderedDict([('feature.1',0.375) ,('feature.2',0.375),('feature.3',0.375) ])
new_data = pd.Series(new_data).values.reshape(1,-1)
ss = linear_regression_model.predict(new_data)
print (ss)
根据趋势,如果我将 0.375 作为所有特征的输入,我希望得到一个大约 0.1875 的值。然而,代码预测了这一点:
[[0.44203368]]
这是不正确的。我不知道问题出在哪里。有人知道我该如何解决吗?
谢谢
您的数据不是线性的。由于特征相同,我只绘制了一个维度:
用线性回归模型逼近非线性函数会产生不好的结果,正如您所经历的那样。您可以尝试建模一个更好的拟合函数并将其参数拟合为 scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
我想根据 3 个特征 和 1 个目标 预测一个参数。这是我的输入文件 (data.csv):
feature.1 feature.2 feature.3 target
1 1 1 0.0625
0.5 0.5 0.5 0.125
0.25 0.25 0.25 0.25
0.125 0.125 0.125 0.5
0.0625 0.0625 0.0625 1
这是我的代码:
import pandas as pd
from sklearn.model_selection import train_test_split
from collections import *
from sklearn.linear_model import LinearRegression
features = pd.read_csv('data.csv')
features.head()
features_name = ['feature.1' , 'feature.2' , 'feature.3']
target_name = ['target']
X = features[features_name]
y = features[target_name]
# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.1, random_state = 42)
linear_regression_model = LinearRegression()
linear_regression_model.fit(X_train,y_train)
#Here is where I want to predict the target value for these inputs for 3 features
new_data = OrderedDict([('feature.1',0.375) ,('feature.2',0.375),('feature.3',0.375) ])
new_data = pd.Series(new_data).values.reshape(1,-1)
ss = linear_regression_model.predict(new_data)
print (ss)
根据趋势,如果我将 0.375 作为所有特征的输入,我希望得到一个大约 0.1875 的值。然而,代码预测了这一点:
[[0.44203368]]
这是不正确的。我不知道问题出在哪里。有人知道我该如何解决吗?
谢谢
您的数据不是线性的。由于特征相同,我只绘制了一个维度:
用线性回归模型逼近非线性函数会产生不好的结果,正如您所经历的那样。您可以尝试建模一个更好的拟合函数并将其参数拟合为 scipy: https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html