归一化数据后,使用回归分析如何预测y?
After normalize data, using regression anlaysis how to predict y?
我已将我的数据归一化并应用回归分析来预测产量 (y)。
但我的预测输出也给出了标准化(0 到 1)
我希望我的预测答案在我正确的数据数字中,而不是在 0 到 1 中。
数据:
Total_yield(y) Rain(x)
64799.30 720.1
77232.40 382.9
88487.70 1198.2
77338.20 341.4
145602.05 406.4
67680.50 325.8
84536.20 791.8
99854.00 748.6
65939.90 1552.6
61622.80 1357.7
66439.60 344.3
接下来,我使用此代码规范化数据:
from sklearn.preprocessing import Normalizer
import pandas
import numpy
dataframe = pandas.read_csv('/home/desktop/yield.csv')
array = dataframe.values
X = array[:,0:2]
scaler = Normalizer().fit(X)
normalizedX = scaler.transform(X)
print(normalizedX)
Total_yield Rain
0 0.999904 0.013858
1 0.999782 0.020872
2 0.999960 0.008924
3 0.999967 0.008092
4 0.999966 0.008199
5 0.999972 0.007481
6 0.999915 0.013026
7 0.999942 0.010758
8 0.999946 0.010414
9 0.999984 0.005627
10 0.999967 0.008167
接下来,我使用此归一化值使用以下代码计算 R-sqaure:
array=normalizedX
data = pandas.DataFrame(array,columns=['Total_yield','Rain'])
import statsmodels.formula.api as smf
lm = smf.ols(formula='Total_yield ~ Rain', data=data).fit()
lm.summary()
输出:
<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
==============================================================================
Dep. Variable: Total_yield R-squared: 0.752
Model: OLS Adj. R-squared: 0.752
Method: Least Squares F-statistic: 1066.
Date: Thu, 09 Feb 2017 Prob (F-statistic): 2.16e-108
Time: 14:21:21 Log-Likelihood: 941.53
No. Observations: 353 AIC: -1879.
Df Residuals: 351 BIC: -1871.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 1.0116 0.001 948.719 0.000 1.009 1.014
Rain -0.3013 0.009 -32.647 0.000 -0.319 -0.283
==============================================================================
Omnibus: 408.798 Durbin-Watson: 1.741
Prob(Omnibus): 0.000 Jarque-Bera (JB): 40636.533
Skew: -4.955 Prob(JB): 0.00
Kurtosis: 54.620 Cond. No. 10.3
==============================================================================
现在,R 平方 = 0.75,
regression model : y = b0 + b1 *x
Yield = b0 + b1 * Rain
Yield = intercept + coefficient for Rain * Rain
Now when I use my data value for Rain data then it will gives this answer :
Yield = 1.0116 + ( -0.3013 * 720.1(mm)) = -215.95
-215.95yield is wrong,
And when I use normalize value for rain data then predicted yield comes in normalize value in between 0 to 1.
I want predict if rainfall will be 720.1 mm then how many yield will be there?
If anyone help me how to get predicted yield ? I want to compare Predicted yield vs given yield.
首先,在这种情况下你不应该使用Normalizer。它不会跨功能规范化。它沿着行进行。你可能不想要它。
使用 MinMaxScaler or RobustScaler to scale each feature. See the preprocessing docs 获取更多详细信息。
其次,这些类有一个inverse_transform()
函数,可以将预测的y值转换回原始单位。
x = np.asarray([720.1,382.9,1198.2,341.4,406.4,325.8,
791.8,748.6,1552.6,1357.7,344.3]).reshape(-1,1)
y = np.asarray([64799.30,77232.40,88487.70,77338.20,145602.05,67680.50,
84536.20,99854.00,65939.90,61622.80,66439.60]).reshape(-1,1)
scalerx = RobustScaler()
x_scaled = scalerx.fit_transform(x)
scalery = RobustScaler()
y_scaled = scalery.fit_transform(y)
根据这些缩放数据调用您的 statsmodel.OLS
。
预测时,首先转换您的测试数据:
x_scaled_test = scalerx.transform([720.1])
在此值上应用您的回归模型并获得结果。 y 的结果将根据缩放后的数据。
Yield_scaled = b0 + b1 * x_scaled_test
因此对其进行逆变换以获取原始单位的数据。
Yield_original = scalery.inverse_transform(Yield_scaled)
但在我看来,这个线性模型不会给出太多的准确性,因为当我绘制你的数据时,这就是结果。
此数据将不适合线性模型。使用其他技术,或获取更多数据。
我已将我的数据归一化并应用回归分析来预测产量 (y)。 但我的预测输出也给出了标准化(0 到 1) 我希望我的预测答案在我正确的数据数字中,而不是在 0 到 1 中。
数据:
Total_yield(y) Rain(x)
64799.30 720.1
77232.40 382.9
88487.70 1198.2
77338.20 341.4
145602.05 406.4
67680.50 325.8
84536.20 791.8
99854.00 748.6
65939.90 1552.6
61622.80 1357.7
66439.60 344.3
接下来,我使用此代码规范化数据:
from sklearn.preprocessing import Normalizer
import pandas
import numpy
dataframe = pandas.read_csv('/home/desktop/yield.csv')
array = dataframe.values
X = array[:,0:2]
scaler = Normalizer().fit(X)
normalizedX = scaler.transform(X)
print(normalizedX)
Total_yield Rain
0 0.999904 0.013858
1 0.999782 0.020872
2 0.999960 0.008924
3 0.999967 0.008092
4 0.999966 0.008199
5 0.999972 0.007481
6 0.999915 0.013026
7 0.999942 0.010758
8 0.999946 0.010414
9 0.999984 0.005627
10 0.999967 0.008167
接下来,我使用此归一化值使用以下代码计算 R-sqaure:
array=normalizedX
data = pandas.DataFrame(array,columns=['Total_yield','Rain'])
import statsmodels.formula.api as smf
lm = smf.ols(formula='Total_yield ~ Rain', data=data).fit()
lm.summary()
输出:
<class 'statsmodels.iolib.summary.Summary'>
"""
OLS Regression Results
==============================================================================
Dep. Variable: Total_yield R-squared: 0.752
Model: OLS Adj. R-squared: 0.752
Method: Least Squares F-statistic: 1066.
Date: Thu, 09 Feb 2017 Prob (F-statistic): 2.16e-108
Time: 14:21:21 Log-Likelihood: 941.53
No. Observations: 353 AIC: -1879.
Df Residuals: 351 BIC: -1871.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept 1.0116 0.001 948.719 0.000 1.009 1.014
Rain -0.3013 0.009 -32.647 0.000 -0.319 -0.283
==============================================================================
Omnibus: 408.798 Durbin-Watson: 1.741
Prob(Omnibus): 0.000 Jarque-Bera (JB): 40636.533
Skew: -4.955 Prob(JB): 0.00
Kurtosis: 54.620 Cond. No. 10.3
==============================================================================
现在,R 平方 = 0.75,
regression model : y = b0 + b1 *x
Yield = b0 + b1 * Rain
Yield = intercept + coefficient for Rain * Rain
Now when I use my data value for Rain data then it will gives this answer :
Yield = 1.0116 + ( -0.3013 * 720.1(mm)) = -215.95
-215.95yield is wrong,
And when I use normalize value for rain data then predicted yield comes in normalize value in between 0 to 1.
I want predict if rainfall will be 720.1 mm then how many yield will be there?
If anyone help me how to get predicted yield ? I want to compare Predicted yield vs given yield.
首先,在这种情况下你不应该使用Normalizer。它不会跨功能规范化。它沿着行进行。你可能不想要它。
使用 MinMaxScaler or RobustScaler to scale each feature. See the preprocessing docs 获取更多详细信息。
其次,这些类有一个inverse_transform()
函数,可以将预测的y值转换回原始单位。
x = np.asarray([720.1,382.9,1198.2,341.4,406.4,325.8,
791.8,748.6,1552.6,1357.7,344.3]).reshape(-1,1)
y = np.asarray([64799.30,77232.40,88487.70,77338.20,145602.05,67680.50,
84536.20,99854.00,65939.90,61622.80,66439.60]).reshape(-1,1)
scalerx = RobustScaler()
x_scaled = scalerx.fit_transform(x)
scalery = RobustScaler()
y_scaled = scalery.fit_transform(y)
根据这些缩放数据调用您的 statsmodel.OLS
。
预测时,首先转换您的测试数据:
x_scaled_test = scalerx.transform([720.1])
在此值上应用您的回归模型并获得结果。 y 的结果将根据缩放后的数据。
Yield_scaled = b0 + b1 * x_scaled_test
因此对其进行逆变换以获取原始单位的数据。
Yield_original = scalery.inverse_transform(Yield_scaled)
但在我看来,这个线性模型不会给出太多的准确性,因为当我绘制你的数据时,这就是结果。
此数据将不适合线性模型。使用其他技术,或获取更多数据。