如何解决为什么DenseFeatures影响模型结果?
我正在研究ANN预测模型。过去,我不使用DenseFeatures
来创建模型。现在,在SAME数据集和图层设置下,我尝试使用DenseFeatures
创建模型以替换以前的预处理。但是模型结果有很大的不同。而且我希望这应该没什么不同,因为使用DesnseFeature可以帮助我控制/监视输入变量。
上一个方法:
# (after preprocessing,e.g one-hot encoder [0,1])
# (df -> array)
BATCH_SIZE = 100
EPOCHS = 1000
model = keras.Sequential()
[layer setting]
...
[later setting]
model.fit(T_x_train,T_y_train,batch_size=BATCH_SIZE,epochs=EPOCHS,verbose=1,validation_data=(T_x_valid,T_y_valid),callbacks = [es])
DenseFeature方法(在模型内部)
参考来自 https://www.tensorflow.org/tutorials/structured_data/feature_columns#create_target_variable
# (setting up the featureColumn for DenseFeature Layer)
# (For example:)
feature_columns = []
# Numeric Columns
numeric_cols = ['x1','x2']
for header in numeric_cols :
feature_columns.append(feature_column.numeric_column(header))
# # Categorical Columns
# # Hour
hour_type = feature_column.categorical_column_with_vocabulary_list(
'hour',[hr for hr in range(24)])
hour_type_one_hot = feature_column.indicator_column(hour_type)
feature_columns.append(hour_type_one_hot)
feature_layer = tf.keras.layers.DenseFeatures(feature_columns,trainable = False)
# (Setting up the training dataset for tf format)
def df_to_dataset(dataframe,target,shuffle=False,batch_size=100):
dataframe = dataframe.copy()
labels = dataframe.pop(target) # Copy and delete column in df
ds = tf.data.Dataset.from_tensor_slices((dict(dataframe),labels))
if shuffle:
ds = ds.shuffle(buffer_size=len(dataframe))
ds = ds.batch(batch_size)
return ds
TRAIN = df_to_dataset(train_full_model_df,target = 'reading')
VALID = df_to_dataset(valid_full_model_df,target = 'reading')
# (model part)
model = keras.Sequential()
model.add(feature_layer)
[layer setting]
...
[later setting]
model.fit(TRAIN,validation_data=(VALID),callbacks = [es])
DenseFeature方法(在模型外部)
# (setting up the featureColumn for DenseFeature Layer)
# (For example:)
feature_columns = []
# Numeric Columns
numeric_cols = ['x1',trainable = False)
# (model part)
model = keras.Sequential()
[layer setting]
...
[later setting]
model.fit(feature_layer(dict(train_full_model_df)),train_full_model_df['reading'].values,validation_data=(feature_layer(dict(valid_full_model_df)),valid_full_model_df['reading'].values),callbacks = [es])
我已经使用feature_layer(dict(train_full_model_df))
对数组/输入值进行了交叉检查,它确认我们正在训练和验证数据上使用样本长度和值。
在Previous Method
和DenseFeature Method (Outside the model)
中可以产生相似的结果,但是DenseFeature Method (Inside the model)
会产生很大的不同甚至更糟的结果。
请帮助,谢谢。我希望在模型中添加DenseFeatures
的计算应该没有什么不同。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。