预测温度时如何使用 RNN?
How would one use a RNN when predicting temperature?
假设我有一个具有多个特征的数据框,例如湿度、压力等。其中一列是温度。
在每一行,我有一天的数据。我想预测第二天的温度,只有过去的数据。
我将如何调整数据框以便它可以在带有 Keras 的 RNN 中使用?
假设您具有以下数据结构,并且我们想要预测过去 1 天的温度:
import tensorflow as tf
import pandas as pd
import numpy as np
df = pd.DataFrame(data={
'temperature': np.random.random((1, 20)).ravel(),
'pressure': np.random.random((1, 20)).ravel(),
'humidity': np.random.random((1, 20)).ravel(),
'wind': np.random.random((1, 20)).ravel()
})
print(df.to_markdown())
temperature
pressure
humidity
wind
0
0.0589101
0.278302
0.875369
0.622687
1
0.594924
0.797274
0.510012
0.374484
2
0.511291
0.334929
0.401483
0.77062
3
0.711329
0.72051
0.595685
0.872691
4
0.495425
0.520179
0.516858
0.628928
5
0.676054
0.67902
0.0213801
0.0267594
6
0.058189
0.69932
0.885174
0.00602091
7
0.708245
0.871698
0.345451
0.448352
8
0.958427
0.471423
0.412678
0.618024
9
0.941202
0.825181
0.211916
0.0808273
10
0.49252
0.541955
0.00522009
0.396557
11
0.323757
0.113585
0.797503
0.323961
12
0.819055
0.637116
0.285361
0.569794
13
0.95123
0.00604303
0.208746
0.150214
14
0.89466
0.948916
0.556422
0.555165
15
0.705789
0.269704
0.289568
0.391438
16
0.154502
0.703137
0.184157
0.765623
17
0.25974
0.934706
0.172775
0.412022
18
0.403475
0.144796
0.0224043
0.891236
19
0.922302
0.805214
0.0232178
0.951568
我们要做的第一件事是将数据分成特征和标签:
features = df.iloc[::2, :] # Get every first row
labels = df.iloc[1::2, :] # Get every second row since we want to predict the temperature given 1 day in the past
特点:
temperature
pressure
humidity
wind
0
0.0589101
0.278302
0.875369
0.622687
2
0.511291
0.334929
0.401483
0.77062
4
0.495425
0.520179
0.516858
0.628928
6
0.058189
0.69932
0.885174
0.00602091
8
0.958427
0.471423
0.412678
0.618024
10
0.49252
0.541955
0.00522009
0.396557
12
0.819055
0.637116
0.285361
0.569794
14
0.89466
0.948916
0.556422
0.555165
16
0.154502
0.703137
0.184157
0.765623
18
0.403475
0.144796
0.0224043
0.891236
标签:
temperature
pressure
humidity
wind
1
0.594924
0.797274
0.510012
0.374484
3
0.711329
0.72051
0.595685
0.872691
5
0.676054
0.67902
0.0213801
0.0267594
7
0.708245
0.871698
0.345451
0.448352
9
0.941202
0.825181
0.211916
0.0808273
11
0.323757
0.113585
0.797503
0.323961
13
0.95123
0.00604303
0.208746
0.150214
15
0.705789
0.269704
0.289568
0.391438
17
0.25974
0.934706
0.172775
0.412022
19
0.922302
0.805214
0.0232178
0.951568
由于您只对预测温度感兴趣,我们可以从标签中删除其他特征并将它们转换为数组:
features = features.to_numpy() # shape (10, 4)
labels = labels['temperature'].to_numpy() # shape (10,)
features = np.expand_dims(features, axis=1) # shape (10, 1, 4)
请注意,features
中添加了一个时间维度,这实际上意味着数据集中的每个样本代表一个时间步长(一天),每个时间步长有 4 个特征(温度、压力、湿度、风)。
构建并 运行 一个 RNN 模型:
inputs = tf.keras.layers.Input(shape=(features.shape[1], features.shape[2]))
rnn_out = tf.keras.layers.SimpleRNN(32)(inputs)
outputs = tf.keras.layers.Dense(1)(rnn_out) # one output = temperature
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss="mse")
model.summary()
history = model.fit(features, labels, batch_size=2, epochs=3)
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 1, 4)] 0
simple_rnn (SimpleRNN) (None, 32) 1184
dense_1 (Dense) (None, 1) 33
=================================================================
Total params: 1,217
Trainable params: 1,217
Non-trainable params: 0
_________________________________________________________________
Epoch 1/3
5/5 [==============================] - 1s 9ms/step - loss: 0.7859
Epoch 2/3
5/5 [==============================] - 0s 7ms/step - loss: 0.5862
Epoch 3/3
5/5 [==============================] - 0s 6ms/step - loss: 0.4354
做出这样的预测:
samples = 1
model.predict(tf.random.normal((samples, 1, 4)))
# array([[-1.610171]], dtype=float32)
您还可以考虑在训练之前对数据进行标准化:
# You usually also normalize your data before training
mean = df.mean(axis=0)
std = df.std(axis=0)
df = df - mean / std
仅此而已。
假设我有一个具有多个特征的数据框,例如湿度、压力等。其中一列是温度。
在每一行,我有一天的数据。我想预测第二天的温度,只有过去的数据。
我将如何调整数据框以便它可以在带有 Keras 的 RNN 中使用?
假设您具有以下数据结构,并且我们想要预测过去 1 天的温度:
import tensorflow as tf
import pandas as pd
import numpy as np
df = pd.DataFrame(data={
'temperature': np.random.random((1, 20)).ravel(),
'pressure': np.random.random((1, 20)).ravel(),
'humidity': np.random.random((1, 20)).ravel(),
'wind': np.random.random((1, 20)).ravel()
})
print(df.to_markdown())
temperature | pressure | humidity | wind | |
---|---|---|---|---|
0 | 0.0589101 | 0.278302 | 0.875369 | 0.622687 |
1 | 0.594924 | 0.797274 | 0.510012 | 0.374484 |
2 | 0.511291 | 0.334929 | 0.401483 | 0.77062 |
3 | 0.711329 | 0.72051 | 0.595685 | 0.872691 |
4 | 0.495425 | 0.520179 | 0.516858 | 0.628928 |
5 | 0.676054 | 0.67902 | 0.0213801 | 0.0267594 |
6 | 0.058189 | 0.69932 | 0.885174 | 0.00602091 |
7 | 0.708245 | 0.871698 | 0.345451 | 0.448352 |
8 | 0.958427 | 0.471423 | 0.412678 | 0.618024 |
9 | 0.941202 | 0.825181 | 0.211916 | 0.0808273 |
10 | 0.49252 | 0.541955 | 0.00522009 | 0.396557 |
11 | 0.323757 | 0.113585 | 0.797503 | 0.323961 |
12 | 0.819055 | 0.637116 | 0.285361 | 0.569794 |
13 | 0.95123 | 0.00604303 | 0.208746 | 0.150214 |
14 | 0.89466 | 0.948916 | 0.556422 | 0.555165 |
15 | 0.705789 | 0.269704 | 0.289568 | 0.391438 |
16 | 0.154502 | 0.703137 | 0.184157 | 0.765623 |
17 | 0.25974 | 0.934706 | 0.172775 | 0.412022 |
18 | 0.403475 | 0.144796 | 0.0224043 | 0.891236 |
19 | 0.922302 | 0.805214 | 0.0232178 | 0.951568 |
我们要做的第一件事是将数据分成特征和标签:
features = df.iloc[::2, :] # Get every first row
labels = df.iloc[1::2, :] # Get every second row since we want to predict the temperature given 1 day in the past
特点:
temperature | pressure | humidity | wind | |
---|---|---|---|---|
0 | 0.0589101 | 0.278302 | 0.875369 | 0.622687 |
2 | 0.511291 | 0.334929 | 0.401483 | 0.77062 |
4 | 0.495425 | 0.520179 | 0.516858 | 0.628928 |
6 | 0.058189 | 0.69932 | 0.885174 | 0.00602091 |
8 | 0.958427 | 0.471423 | 0.412678 | 0.618024 |
10 | 0.49252 | 0.541955 | 0.00522009 | 0.396557 |
12 | 0.819055 | 0.637116 | 0.285361 | 0.569794 |
14 | 0.89466 | 0.948916 | 0.556422 | 0.555165 |
16 | 0.154502 | 0.703137 | 0.184157 | 0.765623 |
18 | 0.403475 | 0.144796 | 0.0224043 | 0.891236 |
标签:
temperature | pressure | humidity | wind | |
---|---|---|---|---|
1 | 0.594924 | 0.797274 | 0.510012 | 0.374484 |
3 | 0.711329 | 0.72051 | 0.595685 | 0.872691 |
5 | 0.676054 | 0.67902 | 0.0213801 | 0.0267594 |
7 | 0.708245 | 0.871698 | 0.345451 | 0.448352 |
9 | 0.941202 | 0.825181 | 0.211916 | 0.0808273 |
11 | 0.323757 | 0.113585 | 0.797503 | 0.323961 |
13 | 0.95123 | 0.00604303 | 0.208746 | 0.150214 |
15 | 0.705789 | 0.269704 | 0.289568 | 0.391438 |
17 | 0.25974 | 0.934706 | 0.172775 | 0.412022 |
19 | 0.922302 | 0.805214 | 0.0232178 | 0.951568 |
由于您只对预测温度感兴趣,我们可以从标签中删除其他特征并将它们转换为数组:
features = features.to_numpy() # shape (10, 4)
labels = labels['temperature'].to_numpy() # shape (10,)
features = np.expand_dims(features, axis=1) # shape (10, 1, 4)
请注意,features
中添加了一个时间维度,这实际上意味着数据集中的每个样本代表一个时间步长(一天),每个时间步长有 4 个特征(温度、压力、湿度、风)。
构建并 运行 一个 RNN 模型:
inputs = tf.keras.layers.Input(shape=(features.shape[1], features.shape[2]))
rnn_out = tf.keras.layers.SimpleRNN(32)(inputs)
outputs = tf.keras.layers.Dense(1)(rnn_out) # one output = temperature
model = tf.keras.Model(inputs=inputs, outputs=outputs)
model.compile(optimizer='adam', loss="mse")
model.summary()
history = model.fit(features, labels, batch_size=2, epochs=3)
Model: "model_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 1, 4)] 0
simple_rnn (SimpleRNN) (None, 32) 1184
dense_1 (Dense) (None, 1) 33
=================================================================
Total params: 1,217
Trainable params: 1,217
Non-trainable params: 0
_________________________________________________________________
Epoch 1/3
5/5 [==============================] - 1s 9ms/step - loss: 0.7859
Epoch 2/3
5/5 [==============================] - 0s 7ms/step - loss: 0.5862
Epoch 3/3
5/5 [==============================] - 0s 6ms/step - loss: 0.4354
做出这样的预测:
samples = 1
model.predict(tf.random.normal((samples, 1, 4)))
# array([[-1.610171]], dtype=float32)
您还可以考虑在训练之前对数据进行标准化:
# You usually also normalize your data before training
mean = df.mean(axis=0)
std = df.std(axis=0)
df = df - mean / std
仅此而已。