如何解决为什么 LSTM 预测的值很低?
我需要预测具有 N 个虚拟机的数据中心的工作负载。数据的结构如下:
id,date,hour,dayofweek,cpu,ram,ram_tot,users,id_vm
5fff03b99b56dba65a873e2a,2020-12-14,00:00,1,2,820,8000,10,1
5fff03ba9b56dba65a873e2c,2458,16000,2
数据包括:id、日期、小时、星期几 (1-7)、VM 的 CPU 数量、使用的 RAM、总 RAM、连接到相关 VM 的用户数、VM id(1 或 2)。 这是在熊猫数据框中导入的。在数据框中,我构建了一个名为 peak 的列,如果存在虚拟机的工作负载(% ram 使用率非常高,> 80%),则其值为 1,否则为 0。 我构建了一个时间序列数据集并将其标准化。我构建了一个 LSTM 网络来预测是否会出现工作负载峰值(预测变量为峰值),具有训练和测试阶段 我在验证阶段得到了非常糟糕的结果:预测值与实际值相比非常低。 我想如果网络在预测峰值时运行良好,则相关值接近 1。
这是我的代码:
#read data from a mongo db and passed in a pandas dataframe
df = DataFrame(list_cur)
# calc for %mem used
df['pmem'] = (df['ram']/df['ram_tot'])*100
conditions = [(df['pmem'] <= 80),(df['pmem'] > 80)] #80
values = [0,1]
df['peak'] = np.select(conditions,values)
df['datetime'] = df['data'] + ' ' + df['ora']
# extract hour and minutes to build 2 new columns
df[['hh','mm']] = df.ora.str.split(":",expand=True,)
# dataset with 6 features and 1 label
# oevery row of the dataset = 1 observation
dataset = df[['hh','mm','dayofweek','users','pmem','id_app','peak']]
# normalization of the dataset
sc = MinMaxScaler(feature_range = (0,1))
dfn = sc.fit_transform(dataset)
# build temporal series
x = []
y = []
n_steps = 192
for i in range(len(dfn)):
# find the end of this pattern
end_ix = i + n_steps
# check if we are beyond the sequence
if end_ix > len(dfn)-1:
break
# gather input and output parts of the pattern
seq_x,seq_y = dfn[i:end_ix,0:5],dfn[end_ix,6]
x.append(seq_x)
y.append(seq_y)
# splitting dataset in train and test
X_train,X_test,y_train,y_test = train_test_split(x,y,test_size=0.33,random_state=42)
# convert in arrays
X_train = np.asarray(X_train,dtype=np.float32)
X_test = np.asarray(X_test,dtype=np.float32)
y_train = np.asarray(y_train,dtype=np.float32)
y_test = np.asarray(y_test,dtype=np.float32)
# LSTM neural network model
model = Sequential()
#Adding the first LSTM layer and some Dropout regularisation
model.add(LSTM(units = 6,return_sequences = True,input_shape = (X_train.shape[1],X_train.shape[2])))
model.add(Dropout(0.2))
# Adding a second LSTM layer and some Dropout regularisation
model.add(LSTM(units = 32,return_sequences = True))
model.add(Dropout(0.2))
# Adding a third LSTM layer and some Dropout regularisation
model.add(LSTM(units = 64,return_sequences = True))
model.add(Dropout(0.2))
# Adding a fourth LSTM layer and some Dropout regularisation
model.add(LSTM(units = 32))
model.add(Dropout(0.2))
# Adding the output layer
model.add(Dense(units = 1))
model.summary()
# Compiling the LSTM
model.compile(loss = 'categorical_crossentropy',optimizer='rmsprop',metrics=['accuracy'])
# Fitting the LSTM to the Training set
history = model.fit(X_train,epochs = 5,batch_size = 32,validation_data=(X_test,y_test))
model.evaluate(X_test,y_test,verbose=1,return_dict=True)
print("test loss,test acc:",history)
print("Generate predictions for all samples")
yhat = model.predict(X_test,verbose=1)
plot.figure(figsize=(20,10))
y1 = np.array(y_test)
y2 = np.array(yhat[:,0])
plt.plot(y1,label = "Test",marker="o",linewidth=0)
plt.plot(y2,label = "Previsto",marker="x",)
plt.xlabel('x - axis')
# Set the y axis label of the current axis.
plt.ylabel('y - axis')
# Set a title of the current axes.
plt.title('Two or more lines on same plot with suitable legends ')
# show a legend on the plot
plt.legend()
# Display a figure.
plt.show()
这是我的结果。
有错误吗?
解决方法
我不确定,但可以尝试对输出进行逆变换 minmaxscaler。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。