fastai:使用预拆分数据集评估表格预测模型
fastai: Evaluate tabular prediction model with pre-splitted dataset
给定一个用于训练和测试的预拆分数据集,我想知道如何在 fastai 中相应地应用预测来访问 MAE 和 RMSE 值。
以下示例来自 fastai,并使用来自 sklearn 的 train_test_split 稍作修改。
import numpy as np
from sklearn.model_selection import train_test_split
from fastai.tabular.all import *
import pandas as pd
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
train, test = train_test_split(df, test_size=0.20, random_state=42)
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
dls = TabularDataLoaders.from_df(train, path, procs=procs, cat_names=cat_names, cont_names=cont_names,
y_names="salary")
learn = tabular_learner(dls)
learn.fit_one_cycle(5)
epoch train_loss valid_loss time
0 0.378432 0.356029 00:05
1 0.369692 0.358837 00:05
2 0.355757 0.348524 00:05
3 0.342714 0.348011 00:05
4 0.334072 0.346690 00:05
learn.unfreeze()
learn.fit_one_cycle(10, max_lr=slice(10e-4, 10e-3))
epoch train_loss valid_loss time
0 0.343953 0.350457 00:05
1 0.349379 0.353308 00:04
2 0.360508 0.352564 00:04
3 0.338458 0.351742 00:05
4 0.334585 0.352128 00:05
5 0.342312 0.351003 00:04
6 0.329152 0.350455 00:05
7 0.334460 0.351833 00:05
8 0.328608 0.351415 00:05
9 0.333205 0.352079 00:04
现在如何将学习模型应用到我的测试集来计算我的指标?以下内容对我不起作用:
learn.predict(test)
这里我得到以下错误:AttributeError: 'DataFrame' object has no attribute 'to_frame'
提前感谢您的帮助!
我最终为每个预测写了一个简单的 for-loop。
当然这效率不高,但解决了我的问题。如果您有任何改进建议以克服缓慢 for-loop,请随时在下方发表评论。
predicted = []
real = []
for elem in range(0,len(test),1):
row, clas, probs = learn.predict(test.iloc[elem])
predicted.append(row["salary"].iloc[-1])
real.append(test["salary"].iloc[elem])
给定一个用于训练和测试的预拆分数据集,我想知道如何在 fastai 中相应地应用预测来访问 MAE 和 RMSE 值。
以下示例来自 fastai,并使用来自 sklearn 的 train_test_split 稍作修改。
import numpy as np
from sklearn.model_selection import train_test_split
from fastai.tabular.all import *
import pandas as pd
path = untar_data(URLs.ADULT_SAMPLE)
df = pd.read_csv(path/'adult.csv')
train, test = train_test_split(df, test_size=0.20, random_state=42)
cat_names = ['workclass', 'education', 'marital-status', 'occupation', 'relationship', 'race']
cont_names = ['age', 'fnlwgt', 'education-num']
procs = [Categorify, FillMissing, Normalize]
dls = TabularDataLoaders.from_df(train, path, procs=procs, cat_names=cat_names, cont_names=cont_names,
y_names="salary")
learn = tabular_learner(dls)
learn.fit_one_cycle(5)
epoch train_loss valid_loss time
0 0.378432 0.356029 00:05
1 0.369692 0.358837 00:05
2 0.355757 0.348524 00:05
3 0.342714 0.348011 00:05
4 0.334072 0.346690 00:05
learn.unfreeze()
learn.fit_one_cycle(10, max_lr=slice(10e-4, 10e-3))
epoch train_loss valid_loss time
0 0.343953 0.350457 00:05
1 0.349379 0.353308 00:04
2 0.360508 0.352564 00:04
3 0.338458 0.351742 00:05
4 0.334585 0.352128 00:05
5 0.342312 0.351003 00:04
6 0.329152 0.350455 00:05
7 0.334460 0.351833 00:05
8 0.328608 0.351415 00:05
9 0.333205 0.352079 00:04
现在如何将学习模型应用到我的测试集来计算我的指标?以下内容对我不起作用:
learn.predict(test)
这里我得到以下错误:AttributeError: 'DataFrame' object has no attribute 'to_frame'
提前感谢您的帮助!
我最终为每个预测写了一个简单的 for-loop。
当然这效率不高,但解决了我的问题。如果您有任何改进建议以克服缓慢 for-loop,请随时在下方发表评论。
predicted = []
real = []
for elem in range(0,len(test),1):
row, clas, probs = learn.predict(test.iloc[elem])
predicted.append(row["salary"].iloc[-1])
real.append(test["salary"].iloc[elem])