我的实验有什么问题(试图预测汽车销量)?
What is wrong with my experiment (Trying to predict car sales)?
我有这样的数据集(只是其中的一个样本):
DATE_REF,MONTH,YEAR,DAY_OF_YEAR,DAY_OF_MONTH,WEEK_DAY,WEEK_DAY_1,WEEK_DAY_2,WEEK_DAY_3,WEEK_DAY_4,WEEK_DAY_5,WEEK_DAY_6,WEEK_DAY_7,WEEK_NUMBER_IN_MONTH,WEEKEND,WORK_DAY,AMOUNT_SOLD
20100101,1,2010,1,1,6,0,0,0,0,0,1,0,1,0,0,0
20100102,1,2010,2,2,7,0,0,0,0,0,0,1,1,1,0,2
20100103,1,2010,3,3,1,1,0,0,0,0,0,0,2,1,0,0
20100104,1,2010,4,4,2,0,1,0,0,0,0,0,2,0,1,12830
20100105,1,2010,5,5,3,0,0,1,0,0,0,0,2,0,1,19200
20100106,1,2010,6,6,4,0,0,0,1,0,0,0,2,0,1,22930
20100107,1,2010,7,7,5,0,0,0,0,1,0,0,2,0,1,23495
20100108,1,2010,8,8,6,0,0,0,0,0,1,0,2,0,1,23215
20100109,1,2010,9,9,7,0,0,0,0,0,0,1,2,1,0,172
20100110,1,2010,10,10,1,1,0,0,0,0,0,0,3,1,0,0
20100111,1,2010,11,11,2,0,1,0,0,0,0,0,3,0,1,18815
20100112,1,2010,12,12,3,0,0,1,0,0,0,0,3,0,1,25415
20100113,1,2010,13,13,4,0,0,0,1,0,0,0,3,0,1,25262
20100114,1,2010,14,14,5,0,0,0,0,1,0,0,3,0,1,27967
20100115,1,2010,15,15,6,0,0,0,0,0,1,0,3,0,1,26352
20100116,1,2010,16,16,7,0,0,0,0,0,0,1,3,1,0,202
20100117,1,2010,17,17,1,1,0,0,0,0,0,0,4,1,0,10
20100118,1,2010,18,18,2,0,1,0,0,0,0,0,4,0,1,20295
20100119,1,2010,19,19,3,0,0,1,0,0,0,0,4,0,1,25982
20100120,1,2010,20,20,4,0,0,0,1,0,0,0,4,0,1,24745
20100121,1,2010,21,21,5,0,0,0,0,1,0,0,4,0,1,28087
20100122,1,2010,22,22,6,0,0,0,0,0,1,0,4,0,1,28417
20100123,1,2010,23,23,7,0,0,0,0,0,0,1,4,1,0,115
20100124,1,2010,24,24,1,1,0,0,0,0,0,0,5,1,0,5
20100125,1,2010,25,25,2,0,1,0,0,0,0,0,5,0,1,20185
20100126,1,2010,26,26,3,0,0,1,0,0,0,0,5,0,1,25932
20100127,1,2010,27,27,4,0,0,0,1,0,0,0,5,0,1,31710
20100128,1,2010,28,28,5,0,0,0,0,1,0,0,5,0,1,21020
20100129,1,2010,29,29,6,0,0,0,0,0,1,0,5,0,1,51460
20100130,1,2010,30,30,7,0,0,0,0,0,0,1,5,1,0,670
20100131,1,2010,31,31,1,1,0,0,0,0,0,0,6,1,0,17
我尝试在 Azure ML 上使用以下实验预测新日期 (DATE_REF
) 的 AMOUNT_SOLD
:
然后我部署了 Web 服务并测试了预测,但我得到的 AMOUNT_SOLD
列为零。
我可能遗漏了什么?
尽管我很想复制您的 Azure ML 实验,但我没有足够的数据。但我所做的如下:
我复制了您的示例数据,然后将其乘以 4 倍(添加行 x 2)。
然后 拆分数据 (70%/30%),随机种子 7(用于可重现的结果)。
Boosted Decision Tree Regression 有默认参数。
在 Tune Model Hyperparameters 上,我选择了 AMOUNT_SOLD 作为标签列。
然后评分模型和评估模型。
准确度/决定系数非常好。
之后,要将其部署为 Web 服务,您必须首先从训练实验中设置预测实验。 Setup Web Service > Predictive Experiment
你的实验会像魔术一样移动。
Web 服务输入 模块将默认放置在实验的顶部。我 将其移动并连接到评分模型 的右侧,这样当您输入 Web 服务的参数时,它 将使用您的训练模型进行预测.
在 Score Model 模块之后,我在 Dataset 模块中放置了一个 Select Columns 并且只选择了名为 Scored Labels[=62= 的列].此列包含模型的预测。然后我使用 Edit Metadata 模块重命名 Scored Labels 列,然后将其传递给 Web Service Output 模块。
您的实验现已准备好部署为 Web 服务。
为了预测新值,我使用当前日期详细信息作为输入测试了 Web 服务。 (虽然 DATE_REF 输入必须是 20170818 :D )
然后输出如下所示:
您的网络服务现在可以预测新值。
我有这样的数据集(只是其中的一个样本):
DATE_REF,MONTH,YEAR,DAY_OF_YEAR,DAY_OF_MONTH,WEEK_DAY,WEEK_DAY_1,WEEK_DAY_2,WEEK_DAY_3,WEEK_DAY_4,WEEK_DAY_5,WEEK_DAY_6,WEEK_DAY_7,WEEK_NUMBER_IN_MONTH,WEEKEND,WORK_DAY,AMOUNT_SOLD
20100101,1,2010,1,1,6,0,0,0,0,0,1,0,1,0,0,0
20100102,1,2010,2,2,7,0,0,0,0,0,0,1,1,1,0,2
20100103,1,2010,3,3,1,1,0,0,0,0,0,0,2,1,0,0
20100104,1,2010,4,4,2,0,1,0,0,0,0,0,2,0,1,12830
20100105,1,2010,5,5,3,0,0,1,0,0,0,0,2,0,1,19200
20100106,1,2010,6,6,4,0,0,0,1,0,0,0,2,0,1,22930
20100107,1,2010,7,7,5,0,0,0,0,1,0,0,2,0,1,23495
20100108,1,2010,8,8,6,0,0,0,0,0,1,0,2,0,1,23215
20100109,1,2010,9,9,7,0,0,0,0,0,0,1,2,1,0,172
20100110,1,2010,10,10,1,1,0,0,0,0,0,0,3,1,0,0
20100111,1,2010,11,11,2,0,1,0,0,0,0,0,3,0,1,18815
20100112,1,2010,12,12,3,0,0,1,0,0,0,0,3,0,1,25415
20100113,1,2010,13,13,4,0,0,0,1,0,0,0,3,0,1,25262
20100114,1,2010,14,14,5,0,0,0,0,1,0,0,3,0,1,27967
20100115,1,2010,15,15,6,0,0,0,0,0,1,0,3,0,1,26352
20100116,1,2010,16,16,7,0,0,0,0,0,0,1,3,1,0,202
20100117,1,2010,17,17,1,1,0,0,0,0,0,0,4,1,0,10
20100118,1,2010,18,18,2,0,1,0,0,0,0,0,4,0,1,20295
20100119,1,2010,19,19,3,0,0,1,0,0,0,0,4,0,1,25982
20100120,1,2010,20,20,4,0,0,0,1,0,0,0,4,0,1,24745
20100121,1,2010,21,21,5,0,0,0,0,1,0,0,4,0,1,28087
20100122,1,2010,22,22,6,0,0,0,0,0,1,0,4,0,1,28417
20100123,1,2010,23,23,7,0,0,0,0,0,0,1,4,1,0,115
20100124,1,2010,24,24,1,1,0,0,0,0,0,0,5,1,0,5
20100125,1,2010,25,25,2,0,1,0,0,0,0,0,5,0,1,20185
20100126,1,2010,26,26,3,0,0,1,0,0,0,0,5,0,1,25932
20100127,1,2010,27,27,4,0,0,0,1,0,0,0,5,0,1,31710
20100128,1,2010,28,28,5,0,0,0,0,1,0,0,5,0,1,21020
20100129,1,2010,29,29,6,0,0,0,0,0,1,0,5,0,1,51460
20100130,1,2010,30,30,7,0,0,0,0,0,0,1,5,1,0,670
20100131,1,2010,31,31,1,1,0,0,0,0,0,0,6,1,0,17
我尝试在 Azure ML 上使用以下实验预测新日期 (DATE_REF
) 的 AMOUNT_SOLD
:
然后我部署了 Web 服务并测试了预测,但我得到的 AMOUNT_SOLD
列为零。
我可能遗漏了什么?
尽管我很想复制您的 Azure ML 实验,但我没有足够的数据。但我所做的如下:
我复制了您的示例数据,然后将其乘以 4 倍(添加行 x 2)。 然后 拆分数据 (70%/30%),随机种子 7(用于可重现的结果)。 Boosted Decision Tree Regression 有默认参数。 在 Tune Model Hyperparameters 上,我选择了 AMOUNT_SOLD 作为标签列。 然后评分模型和评估模型。
准确度/决定系数非常好。
之后,要将其部署为 Web 服务,您必须首先从训练实验中设置预测实验。 Setup Web Service > Predictive Experiment
你的实验会像魔术一样移动。
Web 服务输入 模块将默认放置在实验的顶部。我 将其移动并连接到评分模型 的右侧,这样当您输入 Web 服务的参数时,它 将使用您的训练模型进行预测.
在 Score Model 模块之后,我在 Dataset 模块中放置了一个 Select Columns 并且只选择了名为 Scored Labels[=62= 的列].此列包含模型的预测。然后我使用 Edit Metadata 模块重命名 Scored Labels 列,然后将其传递给 Web Service Output 模块。
您的实验现已准备好部署为 Web 服务。
为了预测新值,我使用当前日期详细信息作为输入测试了 Web 服务。 (虽然 DATE_REF 输入必须是 20170818 :D )
然后输出如下所示:
您的网络服务现在可以预测新值。