预测温度时如何使用 RNN?

How would one use a RNN when predicting temperature?

假设我有一个具有多个特征的数据框,例如湿度、压力等。其中一列是温度。

在每一行,我有一天的数据。我想预测第二天的温度,只有过去的数据。

我将如何调整数据框以便它可以在带有 Keras 的 RNN 中使用?

假设您具有以下数据结构,并且我们想要预测过去 1 天的温度:

import tensorflow as tf
import pandas as pd
import numpy as np

df = pd.DataFrame(data={
    'temperature': np.random.random((1, 20)).ravel(),
    'pressure': np.random.random((1, 20)).ravel(),
    'humidity': np.random.random((1, 20)).ravel(),
    'wind': np.random.random((1, 20)).ravel()
})

print(df.to_markdown())
temperature pressure humidity wind
0 0.0589101 0.278302 0.875369 0.622687
1 0.594924 0.797274 0.510012 0.374484
2 0.511291 0.334929 0.401483 0.77062
3 0.711329 0.72051 0.595685 0.872691
4 0.495425 0.520179 0.516858 0.628928
5 0.676054 0.67902 0.0213801 0.0267594
6 0.058189 0.69932 0.885174 0.00602091
7 0.708245 0.871698 0.345451 0.448352
8 0.958427 0.471423 0.412678 0.618024
9 0.941202 0.825181 0.211916 0.0808273
10 0.49252 0.541955 0.00522009 0.396557
11 0.323757 0.113585 0.797503 0.323961
12 0.819055 0.637116 0.285361 0.569794
13 0.95123 0.00604303 0.208746 0.150214
14 0.89466 0.948916 0.556422 0.555165
15 0.705789 0.269704 0.289568 0.391438
16 0.154502 0.703137 0.184157 0.765623
17 0.25974 0.934706 0.172775 0.412022
18 0.403475 0.144796 0.0224043 0.891236
19 0.922302 0.805214 0.0232178 0.951568

我们要做的第一件事是将数据分成特征和标签:

features = df.iloc[::2, :] # Get every first row 
labels = df.iloc[1::2, :] # Get every second row since we want to predict the temperature given 1 day in the past

特点:

temperature pressure humidity wind
0 0.0589101 0.278302 0.875369 0.622687
2 0.511291 0.334929 0.401483 0.77062
4 0.495425 0.520179 0.516858 0.628928
6 0.058189 0.69932 0.885174 0.00602091
8 0.958427 0.471423 0.412678 0.618024
10 0.49252 0.541955 0.00522009 0.396557
12 0.819055 0.637116 0.285361 0.569794
14 0.89466 0.948916 0.556422 0.555165
16 0.154502 0.703137 0.184157 0.765623
18 0.403475 0.144796 0.0224043 0.891236

标签:

temperature pressure humidity wind
1 0.594924 0.797274 0.510012 0.374484
3 0.711329 0.72051 0.595685 0.872691
5 0.676054 0.67902 0.0213801 0.0267594
7 0.708245 0.871698 0.345451 0.448352
9 0.941202 0.825181 0.211916 0.0808273
11 0.323757 0.113585 0.797503 0.323961
13 0.95123 0.00604303 0.208746 0.150214
15 0.705789 0.269704 0.289568 0.391438
17 0.25974 0.934706 0.172775 0.412022
19 0.922302 0.805214 0.0232178 0.951568

由于您只对预测温度感兴趣,我们可以从标签中删除其他特征并将它们转换为数组:

features = features.to_numpy() # shape (10, 4)
labels = labels['temperature'].to_numpy() # shape (10,)
features = np.expand_dims(features, axis=1) # shape (10, 1, 4)

请注意,features 中添加了一个时间维度,这实际上意味着数据集中的每个样本代表一个时间步长(一天),每个时间步长有 4 个特征(温度、压力、湿度、风)。

构建并 运行 一个 RNN 模型:

inputs = tf.keras.layers.Input(shape=(features.shape[1], features.shape[2]))
rnn_out = tf.keras.layers.SimpleRNN(32)(inputs)
outputs = tf.keras.layers.Dense(1)(rnn_out) # one output = temperature

model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss="mse")
model.summary()
history = model.fit(features, labels, batch_size=2, epochs=3)
Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 1, 4)]            0         
                                                                 
 simple_rnn (SimpleRNN)      (None, 32)                1184      
                                                                 
 dense_1 (Dense)             (None, 1)                 33        
                                                                 
=================================================================
Total params: 1,217
Trainable params: 1,217
Non-trainable params: 0
_________________________________________________________________
Epoch 1/3
5/5 [==============================] - 1s 9ms/step - loss: 0.7859
Epoch 2/3
5/5 [==============================] - 0s 7ms/step - loss: 0.5862
Epoch 3/3
5/5 [==============================] - 0s 6ms/step - loss: 0.4354

做出这样的预测:

samples = 1
model.predict(tf.random.normal((samples, 1, 4)))
# array([[-1.610171]], dtype=float32)

您还可以考虑在训练之前对数据进行标准化:

# You usually also normalize your data before training
mean = df.mean(axis=0)
std = df.std(axis=0)
df = df - mean / std

仅此而已。