如何解决输入不变的神经网络层会学习权重吗?
我正在尝试建立一个网络,其输入会持续/衰减。原始输入将是一个向量,每个元素的输入为0,1或-1。我很好奇在同时激活任何给定输入中是否有任何值,因此我想将权重从1或-1衰减回0,而不是在下一次迭代时将其减小为0,我想这是一种粗略的内存形式。我想说的一个例子:
Normal input:
1 -> 0 -> 0 -> -1 -> 0 ...
With decay .2:
1 -> .8 -> .6 -> -1 -> -.8 ...
通过添加一个带有衰减值向量的额外输入,可以很容易地手动完成此操作,但是我想知道是否有可能让网络在此处学习其自身的值,以便可以为输入的衰减量较小。更重要。
由于每个神经元输出一个值,因此可能有N个神经元(每个所需的衰减值一个),然后将它们作为恒定输入传递给它们1,这样它们就可以输出其权重,可以通过S型激活来进行使用作为衰减值。
在输入始终为1的情况下,该层是否将学习权重?如果没有,有办法吗?
注意: 数据是连续的,这就是为什么我认为激活会相互影响的原因。我也知道循环网络具有内存,但是我不知道我是否有足够的数据来学习关系。同样,这个自定义衰减函数最终可以使它回到0,因为它减去了衰减,乘以较小的权重将渐近地接近0,如果我正确理解的话,这就是RNN的作用。
解决方法
您可以使用TensorFlow功能API轻松创建此类架构。
创建数据集和模型 代码:
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
# Generating features
np.random.seed(100)
x1 = tf.constant(np.ones(shape =(100,1)),dtype = tf.float32)
x2 = tf.constant(np.ones(shape =(100,dtype = tf.float32)
x3 = tf.constant(np.ones(shape =(100,dtype = tf.float32)
y = tf.constant(np.random.randint(2,size =(100,)),dtype = tf.float32)
def create_model():
input1 = tf.keras.Input(shape=(1,))
input2 = tf.keras.Input(shape=(1,))
input3 = tf.keras.Input(shape=(1,))
hidden1 = tf.keras.layers.Dense(units = 1,activation='sigmoid',use_bias = False)(input1)
hidden2 = tf.keras.layers.Dense(units = 1,use_bias = False)(input2)
hidden3 = tf.keras.layers.Dense(units = 1,use_bias = False)(input3)
merge = tf.keras.layers.concatenate([hidden1,hidden2,hidden3])
hidden4 = tf.keras.layers.Dense(units = 4,activation='sigmoid')(merge)
output1 = tf.keras.layers.Dense(units = 2,activation='softmax')(hidden4)
model = tf.keras.models.Model(inputs = [input1,input2,input3],outputs = output1,name= "functional1")
return model
model = create_model()
# setting decay values
model.layers[3].set_weights([tf.constant([[0.8]])])
model.layers[4].set_weights([tf.constant([[0.8]])])
model.layers[5].set_weights([tf.constant([[0.8]])])
tf.keras.utils.plot_model(model,'my_first_model.png',show_shapes=True)
培训过程:
# Instantiate an optimizer.
optimizer = tf.keras.optimizers.SGD(learning_rate=10)
# Instantiate a loss function.
loss_fn = tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
epochs = 50
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
# Open a GradientTape to record the operations run
# during the forward pass,which enables auto-differentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model([x1,x2,x3],training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y,logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value,model.trainable_weights)
print('Gradients of- Decay 1: {} Decay 2: {} Decay 3: {}'.format(grads[0].numpy()[0][0],grads[1].numpy()[0][0],grads[2].numpy()[0][0]))
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads,model.trainable_weights))
# Log every epochs.
print("Training loss (for one batch) at epoch %d: %.4f" % (epoch,float(loss_value)))
print('------------------------------')
输出:
Start of epoch 0
Gradients of- Decay 1: -0.001539231976494193 Decay 2: 0.0013862588675692677 Decay 3: -0.0024916294496506453
Training loss (for one batch) at epoch 0: 0.7312
------------------------------
Start of epoch 1
Gradients of- Decay 1: 0.0015823811991140246 Decay 2: -0.00021153852867428213 Decay 3: 0.0008941243286244571
Training loss (for one batch) at epoch 1: 0.7042
------------------------------
Start of epoch 2
Gradients of- Decay 1: -0.0013041968923062086 Decay 2: 0.0005898184608668089 Decay 3: -0.0015725962584838271
Training loss (for one batch) at epoch 2: 0.7039
------------------------------
Start of epoch 3
Gradients of- Decay 1: 0.00156548956874758 Decay 2: -0.00017016787023749202 Decay 3: 0.000881993502844125
Training loss (for one batch) at epoch 3: 0.7045
------------------------------
Start of epoch 4
Gradients of- Decay 1: -0.0012605276424437761 Decay 2: 0.00047704551252536476 Decay 3: -0.0015090997330844402
Training loss (for one batch) at epoch 4: 0.7028
------------------------------
Start of epoch 5
Gradients of- Decay 1: 0.0014193064998835325 Decay 2: -0.0001368212979286909 Decay 3: 0.0008420557714998722
Training loss (for one batch) at epoch 5: 0.7027
------------------------------
Start of epoch 6
Gradients of- Decay 1: -0.0011729025281965733 Decay 2: 0.0003637363843154162 Decay 3: -0.0013745202450081706
Training loss (for one batch) at epoch 6: 0.7011
------------------------------
Start of epoch 7
Gradients of- Decay 1: 0.0012617181055247784 Decay 2: -0.00010974107135552913 Decay 3: 0.0007924885721877217
Training loss (for one batch) at epoch 7: 0.7007
------------------------------
Start of epoch 8
Gradients of- Decay 1: -0.0010727590415626764 Decay 2: 0.000274341378826648 Decay 3: -0.0012277730274945498
Training loss (for one batch) at epoch 8: 0.6995
------------------------------
Start of epoch 9
Gradients of- Decay 1: 0.0011162457522004843 Decay 2: -8.809947757981718e-05 Decay 3: 0.0007380791357718408
Training loss (for one batch) at epoch 9: 0.6991
------------------------------
Start of epoch 10
Gradients of- Decay 1: -0.0009710552403703332 Decay 2: 0.00020754436263814569 Decay 3: -0.001086110481992364
Training loss (for one batch) at epoch 10: 0.6982
------------------------------
衰减率的最终值。
print(model.layers[3].get_weights())
print(model.layers[4].get_weights())
print(model.layers[5].get_weights())
输出:
[array([[0.7963085]],dtype=float32)]
[array([[0.7707753]],dtype=float32)]
[array([[0.8614942]],dtype=float32)]
要记住的事情-
您的学习不仅取决于您的输入,而且还取决于您的输出。在计算显示在上方的梯度时,梯度方程中会显示输出以及预测的输出项。因此,只要您有不同的输出,学习仍然会发生。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。