Keras 模型可以学习，但预测比未来更适合过去

如何解决Keras 模型可以学习，但预测比未来更适合过去

我正在尝试为时间序列预测重现 tensorflow 官方 tutorial，但使用不同的数据和不同的生成器。简而言之，我使用 7 天（数据点）气象数据的滑动窗口来预测第 8 天的温度。我保持神经网络架构不变。我设法让网络学习，它展示了正确的趋势，但它对第 8 天的预测更适合滑动窗口输入的第 1 天的真实值，而不是它们适合第 8 天的真实值。我在数字和 MSE 结果。

但是在官方的 tensorflow 教程中，预测要好得多。我究竟做错了什么？谢谢

这是 colab 中的代码 notebook

这是meteo data（预处理、清理和规范化）

以下是数字：

这是明确粘贴的代码：

import matplotlib.pyplot as plt
import numpy as np
from sklearn import preprocessing
import pandas as pd
import scipy

from tensorflow.keras.preprocessing.sequence import TimeseriesGenerator
from tensorflow.keras.models import Model,Sequential,load_model
from tensorflow.keras.optimizers import SGD,Adam
import tensorflow as tf
from tensorflow.keras.layers import *

logpath_ms = './best_model.h5'


# the data is already cleaned,preprocessed and normalized,but have a look at it if you want:
df0 = pd.read_csv('github_example_normalized.txt',index_col=0)
X = df0.values

"""use 7 days (data points) to predict the temperature for the folowing 3 days at each time step
creating a target array y that corresponds to predicting multiple days at once:
"""

y = np.stack((np.roll(X,axis=0),np.roll(X,-1,-2,axis=0)),axis=1)

#only take the temperature feature from the data:
y = y[:,:,3] 

n_input = 7 #one input window contains 7 data points

# split the data into 80% training 10% validation,10% testing
train_generator = TimeseriesGenerator(X[:(-n_input-1)],y[:(-n_input-1)],length=n_input,batch_size=8,end_index=int(0.8*len(X[:(-n_input-1)])))
val_generator = TimeseriesGenerator(X[:(-n_input-1)],start_index=int(0.8*len(X[:(-n_input-1)])),end_index=int(0.9*len(X[:(-n_input-1)])))
test_generator = TimeseriesGenerator(X[:(-n_input-1)],start_index=int(0.9*len(X[:(-n_input-1)])))

# with the below lines you can have a look how x and y look like
# for i in range(5):
#     x_,y_ = test_generator[i]
#     print(x_.shape)
#     print(y_.shape)
#     print('%s => %s' % (x_,y_))

modelsave_cb = tf.keras.callbacks.ModelCheckpoint(logpath_ms,monitor='val_loss',mode='min',verbose=1,save_best_only=True)

## model taken from here: https://www.tensorflow.org/tutorials/structured_data/time_series?fbclid=IwAR1CfmX6adoEpeVF9hqc1eNMf7AJIZM0pEzWpyMvbfFizxsa2uR97yDvgKQ#recurrent_neural_network
model = Sequential()
model.add(LSTM(32,return_sequences=True,input_shape=(n_input,12)))
model.add(MaxPool1D(2))
model.add(Dense(1))
model.compile(loss='mean_squared_error',optimizer='adam',metrics=['mse'])

model.fit(train_generator,validation_data=val_generator,epochs=5,callbacks=[modelsave_cb])

model = load_model(logpath_ms)

y_pred = model.predict(test_generator)

"""Below an evaluation of the results: the NN learns the trend,but it learns more to follow values of 7 data points in the past,not to predict the future. And unfortunately if we compare to the MSE of a naive baseline,where the prediction for t1 is just the ground truth at t0 than the baseline scores better"""

fig = plt.figure(figsize=(14,7))
plt.plot(df_avg.index[test_generator.start_index:test_generator.end_index + 1],test_generator.targets[test_generator.start_index:test_generator.end_index + 1][:,0],label='ground truth')
plt.plot(df_avg.index[test_generator.start_index:test_generator.end_index + 1],y_pred[:,:],label='pred day 1 after sliding window')
plt.legend()

# like above,but predictions shifted 7 days into the past. Gives much better fit to the ground truth
fig = plt.figure(figsize=(14,label='ground truth')
plt.plot(df_avg.index[test_generator.start_index-7:test_generator.end_index + 1-7],label='pred day 1,like above,shifted 7 data points into past')
plt.legend()

# calculating MSE of the predictions
print(sum(y_pred[:,:].flatten() - test_generator.targets[test_generator.start_index:test_generator.end_index + 1,0])**2/len(y_pred))

# calculating MSE of the predictions shifted by 7 points into the past
print(sum(y_pred[7:,:].flatten() - test_generator.targets[test_generator.start_index:test_generator.end_index + 1 - 7,0])**2/len(y_pred[:-7]))

# MSE of a naive baseline,if we just make the model predict a value at t1 that is equal to the value for the previous period t0
y_t1 = test_generator.targets[test_generator.start_index:test_generator.end_index + 1,0]
y_t0 = test_generator.targets[test_generator.start_index-1:test_generator.end_index,0]
print(sum(y_t1-y_t0)**2/len(y_pred))

Keras 模型可以学习，但预测比未来更适合过去

如何解决Keras 模型可以学习，但预测比未来更适合过去

相关推荐