RandomForest Regressor:预测和检查性能
RandomForest Regressor: Predict and check performance
我正在尝试预测未来 5 天的价格。我遵循了 this 教程。本教程是关于预测分类变量的,因此使用了 RandomForest 分类器。我使用的方法与本教程中定义的方法相同,但使用的是 RandomForest Regressor,因为我必须预测未来 5 天的最后价格。我很困惑如何预测
这是我的代码:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics.ranking import roc_curve, auc, roc_auc_score
priceTrainData = pd.read_csv('trainPriceData.csv')
#read test data set
priceTestData = pd.read_csv('testPriceData.csv')
priceTrainData['Type'] = 'Train'
priceTestData['Type'] = 'Test'
target_col = "last"
features = ['low', 'high', 'open', 'last', 'annualized_volatility', 'weekly_return',
'daily_average_volume_10',# try to use log in 10, 30,
'daily_average_volume_30', 'market_cap']
priceTrainData['is_train'] = np.random.uniform(0, 1, len(priceTrainData)) <= .75
Train, Validate = priceTrainData[priceTrainData['is_train']==True], priceTrainData[priceTrainData['is_train']==False]
x_train = Train[list(features)].values
y_train = Train[target_col].values
x_validate = Validate[list(features)].values
y_validate = Validate[target_col].values
x_test = priceTestData[list(features)].values
random.seed(100)
rf = RandomForestRegressor(n_estimators = 1000)
rf.fit(x_train, y_train)
status = rf.predict(x_validate)
我的第一个问题是如何指定获取 5 个预测值,第二个问题是如何检查 RandomForest Regressor 的性能?请帮助我。
你的x_validate本质上是'pandas.core.series.Series'。所以你可以执行这个:
x_validate[0:5]
这将通过计算 R 平方值来解决您的第二个问题。
rf.score(x_train,y_train)
我正在尝试预测未来 5 天的价格。我遵循了 this 教程。本教程是关于预测分类变量的,因此使用了 RandomForest 分类器。我使用的方法与本教程中定义的方法相同,但使用的是 RandomForest Regressor,因为我必须预测未来 5 天的最后价格。我很困惑如何预测
这是我的代码:
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics.ranking import roc_curve, auc, roc_auc_score
priceTrainData = pd.read_csv('trainPriceData.csv')
#read test data set
priceTestData = pd.read_csv('testPriceData.csv')
priceTrainData['Type'] = 'Train'
priceTestData['Type'] = 'Test'
target_col = "last"
features = ['low', 'high', 'open', 'last', 'annualized_volatility', 'weekly_return',
'daily_average_volume_10',# try to use log in 10, 30,
'daily_average_volume_30', 'market_cap']
priceTrainData['is_train'] = np.random.uniform(0, 1, len(priceTrainData)) <= .75
Train, Validate = priceTrainData[priceTrainData['is_train']==True], priceTrainData[priceTrainData['is_train']==False]
x_train = Train[list(features)].values
y_train = Train[target_col].values
x_validate = Validate[list(features)].values
y_validate = Validate[target_col].values
x_test = priceTestData[list(features)].values
random.seed(100)
rf = RandomForestRegressor(n_estimators = 1000)
rf.fit(x_train, y_train)
status = rf.predict(x_validate)
我的第一个问题是如何指定获取 5 个预测值,第二个问题是如何检查 RandomForest Regressor 的性能?请帮助我。
你的x_validate本质上是'pandas.core.series.Series'。所以你可以执行这个: x_validate[0:5]
这将通过计算 R 平方值来解决您的第二个问题。 rf.score(x_train,y_train)