实施autokeras时间序列模型时出错

Error in implementing autokeras timeseries model

我试图在串行数据集上实现 autokeras TimeSeriesForecaster。下面分别给出数据集的特征和标签。

df1_x =

df1_y = 
0    2.5
1    2.1
2    2.2
3    2.2
4    1.5
Name: target_carbon_monoxide, dtype: float64

AutoML 准备

#parameters
predict_from = 1
predict_until = 1
lookback = 3
clf = ak.TimeseriesForecaster(
    lookback=lookback,
    predict_from=predict_from,
    predict_until=predict_until,
    max_trials=1,
    objective="val_loss",
)
# Train the TimeSeriesForecaster with train data
clf.fit(
    x=df1_x,
    y=df1_y,
    epochs=10,
)

数据框没有 NaN 值,特征数据框的形状是 (7111, 8),即二维数据框。

但是报错如下:

Search: Running Trial #1

Hyperparameter    |Value             |Best Value So Far 
timeseries_bloc...|True              |?                 
timeseries_bloc...|lstm              |?                 
timeseries_bloc...|3                 |?                 
regression_head...|0                 |?                 
optimizer         |adam              |?                 
learning_rate     |0.001             |?                 

Epoch 1/10
    173/Unknown - 4s 5ms/step - loss: 2.2421 - mean_squared_error: 2.2421
---------------------------------------------------------------------------
InvalidArgumentError                      Traceback (most recent call last)
/tmp/ipykernel_11292/1163792963.py in <module>
     10 )
     11 # Train the TimeSeriesForecaster with train data
---> 12 clf.fit(
     13     x=df1_x,
     14     y=df1_y,

InvalidArgumentError:  Incompatible shapes: [32,1] vs. [30,1]
     [[node mean_squared_error/SquaredDifference (defined at home/samar/.local/lib/python3.8/site-packages/autokeras/utils/utils.py:88) ]] [Op:__inference_train_function_13895]

Function call stack:
train_function

您需要向 fit() 提供验证数据。如果您将拥有的数据 (df1) 拆分为训练集和验证,并将它们都提供给 fit(),则训练效果会很好。尝试使用 train_test_split 拆分数据,或者您可以手动进行。 你的代码应该是这样的:

from sklearn.model_selection import train_test_split
df1_x, df1_x_eval, df1_y, df1_y_eval = train_test_split(df1_x, df1_y, test_size=0.25, random_state=42)
clf.fit(
    x=df1_x,
    y=df1_y,
    validation_data = (df_x_eval, df_y_eval),
    epochs=10)