如何解决了解小批量的渐变胶带
在以下取自Keras documentation的示例中,我想了解grads
的计算方式。梯度grads
是否对应于使用批次(x_batch_train,y_batch_train)
计算的平均梯度?换句话说,该算法是否使用小批量中的每个样本针对每个变量计算梯度,然后对其求平均值以获得grads
?
for epoch in range(epochs):
print("\nStart of epoch %d" % (epoch,))
# Iterate over the batches of the dataset.
for step,(x_batch_train,y_batch_train) in enumerate(train_dataset):
# Open a GradientTape to record the operations run
# during the forward pass,which enables auto-differentiation.
with tf.GradientTape() as tape:
# Run the forward pass of the layer.
# The operations that the layer applies
# to its inputs are going to be recorded
# on the GradientTape.
logits = model(x_batch_train,training=True) # Logits for this minibatch
# Compute the loss value for this minibatch.
loss_value = loss_fn(y_batch_train,logits)
# Use the gradient tape to automatically retrieve
# the gradients of the trainable variables with respect to the loss.
grads = tape.gradient(loss_value,model.trainable_weights)
# Run one step of gradient descent by updating
# the value of the variables to minimize the loss.
optimizer.apply_gradients(zip(grads,model.trainable_weights))
解决方法
默认值为export { Application } from "https://deno.land/x/abc@v1.2.1/mod.ts";
。
阅读this。
,您的假设是正确的。
DachuanZhao提供的文档还显示,该批次中的元素总和是平均的。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。