scikit-learn 库在这段代码中做了什么?
What does the scikit-learn library do in this code?
我对机器学习领域很感兴趣,我试图看懂下面的代码,但我看不懂。谁能简单的给我解释一下?
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split # This module divides our data into two parts, train and test
import sklearn.metrics as met
from sklearn.datasets import load_boston
boston = load_boston()
x = boston.data
y = boston.target
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(xtrain, ytrain)
ypredict = model.predict(xtest)
plt.scatter(ytest, ypredict)
plt.show()
print(met.mean_squared_error(ytest, ypredict))
步骤如下:
- 加载数据并分配变量:
load_boston()
x = boston.data
y = boston.target
- 将数据分成用于训练和验证(测试)的数据。通常是 80/20 的比例。
random_state
设置为 42 感谢 银河系漫游指南(我不想破坏你任何东西...)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
- 进行单变量线性回归。即创建模型。
model = LinearRegression()
model.fit(xtrain, ytrain)
- 使用验证(测试)数据检查在上一步中创建的模型。
ypredict = model.predict(xtest)
- 绘制验证结果与预测结果的散点图
plt.scatter(ytest, ypredict)
plt.show()
- 以均方误差表示的模型打印精度。
print(met.mean_squared_error(ytest, ypredict))
关注评论看懂代码
# importing various modules
# Imports Linear Regression model to fit the features in a linear combination to derive the target value.
from sklearn.linear_model import LinearRegression
#To plot/visualize the data importing matplotlib
import matplotlib.pyplot as plt
# This module divides our data into two parts, train and test
from sklearn.model_selection import train_test_split
# metrics is used to analyze the model performance (such as mean squared error)
import sklearn.metrics as met
# sklearn.datasets has various datasets for quick use
from sklearn.datasets import load_boston
#loading the boston housing dataset form standard sklean.datasets module
boston = load_boston()
#seperating the features (X) and target variable (y) boston dataset
x = boston.data
y = boston.target
# Dividing the dataset into training and test to train the model and evaluate the model.
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
# creating an linear regression object
model = LinearRegression()
#training the model
model.fit(xtrain, ytrain)
#once the model is trained predicting the target values for test data which is not used in training (i.e. unseen data for model)
ypredict = model.predict(xtest)
#Ploting the actual value and target value in a scatter plot to visualize how far/close is the prediction from the actual values.
plt.scatter(ytest, ypredict)
plt.show()
#calculate the mean squared error, it indicates how far (avg) is the prediction from actual
print(met.mean_squared_error(ytest, ypredict))
我对机器学习领域很感兴趣,我试图看懂下面的代码,但我看不懂。谁能简单的给我解释一下?
from sklearn.linear_model import LinearRegression
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split # This module divides our data into two parts, train and test
import sklearn.metrics as met
from sklearn.datasets import load_boston
boston = load_boston()
x = boston.data
y = boston.target
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
model = LinearRegression()
model.fit(xtrain, ytrain)
ypredict = model.predict(xtest)
plt.scatter(ytest, ypredict)
plt.show()
print(met.mean_squared_error(ytest, ypredict))
步骤如下:
- 加载数据并分配变量:
load_boston()
x = boston.data
y = boston.target
- 将数据分成用于训练和验证(测试)的数据。通常是 80/20 的比例。
random_state
设置为 42 感谢 银河系漫游指南(我不想破坏你任何东西...)
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
- 进行单变量线性回归。即创建模型。
model = LinearRegression()
model.fit(xtrain, ytrain)
- 使用验证(测试)数据检查在上一步中创建的模型。
ypredict = model.predict(xtest)
- 绘制验证结果与预测结果的散点图
plt.scatter(ytest, ypredict)
plt.show()
- 以均方误差表示的模型打印精度。
print(met.mean_squared_error(ytest, ypredict))
关注评论看懂代码
# importing various modules
# Imports Linear Regression model to fit the features in a linear combination to derive the target value.
from sklearn.linear_model import LinearRegression
#To plot/visualize the data importing matplotlib
import matplotlib.pyplot as plt
# This module divides our data into two parts, train and test
from sklearn.model_selection import train_test_split
# metrics is used to analyze the model performance (such as mean squared error)
import sklearn.metrics as met
# sklearn.datasets has various datasets for quick use
from sklearn.datasets import load_boston
#loading the boston housing dataset form standard sklean.datasets module
boston = load_boston()
#seperating the features (X) and target variable (y) boston dataset
x = boston.data
y = boston.target
# Dividing the dataset into training and test to train the model and evaluate the model.
xtrain, xtest, ytrain, ytest = train_test_split(x, y, test_size=0.2, random_state=42)
# creating an linear regression object
model = LinearRegression()
#training the model
model.fit(xtrain, ytrain)
#once the model is trained predicting the target values for test data which is not used in training (i.e. unseen data for model)
ypredict = model.predict(xtest)
#Ploting the actual value and target value in a scatter plot to visualize how far/close is the prediction from the actual values.
plt.scatter(ytest, ypredict)
plt.show()
#calculate the mean squared error, it indicates how far (avg) is the prediction from actual
print(met.mean_squared_error(ytest, ypredict))