神经网络优化过程

神经网络优化
预备知识
1 神经网络复杂度
- 1.1 时间复杂度
2 学习率策略
- 2.1 指数衰减
- 2.2 分段常数衰减
3 激活函数
4 损失函数
5 欠拟合与过拟合

神经网络优化

主要学会神经网络的优化过程，使用正则化减少过拟合，使用优化器更新网络参数
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-A60DT1fr-1661261830777)(attachment:image.png)]

预备知识

tf.where()

tf.where(
    condition, x=None, y=None, name=None
)

功能：
根据condition，取x或y中的值。如果为True，对应位置取x的值；如果为
False，对应位置取y的值。

参数：
condition: bool型张量.
x: 与y shape相同的张量.
y: 与x shape相同的张量.

返回：
shape与x相同的张量

import tensorflow as tf
a = tf.constant([1,2,3,1,1])
b = tf.constant([0,1,3,4,5])

c = tf.where(tf.greater(a, b), a, b) #若a > b，返回a对应位置的元素，否则返回b对应位置的元素

print("c:",c)

c: tf.Tensor([1 2 3 4 5], shape=(5,), dtype=int32)

np.random.RandomState.rand()

功能：
返回一个[0,1)之间的随机数

参数：
np.random.RandomState.rand(维度) #维度为空，返回一个标量

import numpy as np

rdm = np.random.RandomState(seed=1) # seed=常数，每次生成随机数相同
a = rdm.rand() #返回一个随机标量
b = rdm.rand(2,3)
print('a:',a)
print('b:',b)

a: 0.417022004702574
b: [[7.20324493e-01 1.14374817e-04 3.02332573e-01]
 [1.46755891e-01 9.23385948e-02 1.86260211e-01]]

np.vstack()

功能

将两个数组按照垂直方向叠加

np.vstack(数组1，数组2)

a = np.array([1,3,2])
b = np.array([4,5,6])
c = np.vstack((b, a))
print('c:',c)

c: [[4 5 6]
 [1 3 2]]

** np.mgrid[]、np.ravel()、np.c_[]**

np.mgrid [起始值，结束值)

功能：生成网格数

np.mgrid[起始值：结束值：步长，起始值：结束值：步长， …]

np.ravel()

功能：将x变成一维数组

np.c_[]

功能：使返回的间隔值点配对

np.c_[数组1，数组2，…]

x,y = np.mgrid[1:3:1, 2:4:0.5]
grid = np.c_[x.ravel(), y.ravel()]
print('x:',x)
print('y:',y)
print('grid:\n', grid)

x: [[1. 1. 1. 1.]
 [2. 2. 2. 2.]]
y: [[2.  2.5 3.  3.5]
 [2.  2.5 3.  3.5]]
grid:
 [[1.  2. ]
 [1.  2.5]
 [1.  3. ]
 [1.  3.5]
 [2.  2. ]
 [2.  2.5]
 [2.  3. ]
 [2.  3.5]]

1 神经网络复杂度

1.1 时间复杂度

NN复杂度：多用NN层数和NN参数的个数表示

空间复杂度：

层数 = 隐藏层的层数 + 1个输出层
总参数 = 总w + 总b

时间复杂度

乘加运算次数

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RLiI5jJt-1661261830778)(attachment:image.png)]

2 学习率策略

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nEparrsp-1661261830779)(attachment:image.png)]

2.1 指数衰减

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-dIubF5R3-1661261830779)(attachment:image-2.png)]

其中，是初始学习率，是衰减率，表示从0到当前的训练次数，用来控制衰减速度

指数衰减学习率是先使用较大的学习率来快速得到一个较优的解，然后随着迭代的继续,逐步减小
学习率，使得模型在训练后期更加稳定。指数型学习率衰减法是最常用的衰减方法，在大量模型中都广
泛使用。

TensorFlow API: tf.keras.optimizers.schedules.ExponentialDecay

tf.keras.optimizers.schedules.ExponentialDecay(

    initial_learning_rate, decay_steps, decay_rate, staircase=False, name=None
    
)

功能:指数衰减学习率策略

等价API: tf.optimizers.schedules.ExponentialDecay

参数：

 > initial_learning_rate: 初始学习率
 
 > deacy_steps：衰减步数，staircase为True时有效
 
 > decay_rate: 衰减率.
 
 > staircase: Bool型变量.如果为True, 学习率呈现阶梯型下降趋势

返回： tf.keras.optimizers.schedules.ExponentialDecay(step)返回计算得到的学习率.

链接： tf.keras.optimizers.schedules.ExponentialDecay

import matplotlib.pyplot as plt
N = 400
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(

    0.5,
    decay_steps=10,
    decay_rate=0.9,
    staircase=False)
y = []
for global_step in range(N):
    lr = lr_schedule(global_step)
    y.append(lr)
    
x =range(N)
plt.figure(figsize=(8,5))
plt.plot(x, y,'r-')
plt.ylim([0, max(plt.ylim())])
plt.xlabel('Step')
plt.ylabel('Learning Rate')
plt.title('ExponentialDecay')
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-sXDX4ahl-1661261830783)(output_11_0.png)]

2.2 分段常数衰减

分段常数衰减可以让调试人员针对不同任务设置不同的学习率，进行精细调参，在任意步长后下降
任意数值的learning rate，要求调试人员对模型和数据集有深刻认识.

tf.keras.optimizers.schedules.PiecewiseConstantDecay

tf.keras.optimizers.schedules.PiecewiseConstantDecay(
    boundaries, values, name=None
)

功能： 分段常数衰减学习率策略.

等价API： tf.optimizers.schedules.PiecewiseConstantDecay

参数：

boundaries: [step_1, step_2, …, step_n]定义了在第几步进行学习率衰减.
values: [val_0, val_1, val_2, …, val_n]定义了学习率的初始值和后续衰减时的具体取值.

**返回：**tf.keras.optimizers.schedules.PiecewiseConstantDecay(step)返回计算得到的学习率

N = 400
lr_schedule = tf.keras.optimizers.schedules.PiecewiseConstantDecay(
    boundaries=[100, 200, 300],
    values=[0.1, 0.05, 0.025, 0.001])

y = []
for global_step in range(N):
    lr = lr_schedule(global_step)
    y.append(lr)
x = range(N)
plt.figure(figsize=(8,6))
plt.plot(x, y, 'r-')
plt.ylim([0,max(plt.ylim())])
plt.xlabel('Step')
plt.ylabel('Learning Rate')
plt.title('PiecewiseConstantDecay')
plt.show()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-DO31f5Ve-1661261830784)(output_14_0.png)]

3 激活函数

激活函数是用来加入非线性因素的，因为线性模型的表达能力不够。引入非线性激活函数，可使深
层神经网络的表达能力更加强大。

优秀的激活函数应满足:

非线性，激活函数非线性时，多层NN可逼近所有函数
可微性，优化器大多用梯度下降更新参数
单调性，当激活函数是单调的，能保证单层网络的损失函数是凸函数
** 近似恒等性：**f(x)≈x，当参数初始化为随机小值时，神经网络更稳定

激活函数输出值的范围：

激活函数输出为有限值时，基于梯度的优化方法更稳定
激活函数输出为无限值时，建议调小学习率

常见的激活函数有：sigmoid，tanh，ReLU，Leaky ReLU，PReLU，RReLU，
ELU（Exponential Linear Units），softplus，softsign，softmax等，下面介绍几个典型的激活
函数：

3.1 sigmoid函数

TensorFlow API: tf.nn.sigmoid()

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-nSFbkZqe-1661261830784)(attachment:image.png)]

优点：
（1）：输出映射在（0,1）之间，单调连续，输出范围有限，优化稳定，可用作输出层

（2）：求导容易

特点：

（1）：易造成梯度消失

（2）：输出非0均值，收敛慢

（3）：幂运算复杂，训练时间长

sigmoid函数可应用在训练过程中。然而，当处理分类问题作出输出时，sigmoid却无能为力。简
单地说，sigmoid函数只能处理两个类，不适用于多分类问题。而softmax可以有效解决这个问题，并
且softmax函数大都运用在神经网路中的最后一层网络中，使得值得区间在（0,1）之间，而不是二分类
的。

3.2 tanh

TensoFlow API:tf.math.tanh(x)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-JTQj2fDp-1661261830784)(attachment:image.png)]

优点：
1、比sigmoid函数收敛速度更快

2、相比sigmoid函数，输出以0为中心

缺点：

1、易造成梯度消失

2、幂运算复杂，训练时间长

3.3 ReLU

TensoFlow API:tf.nn.relu(x)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-zgLWPfvj-1661261830785)(attachment:image.png)]

优点：

1、解决了梯度消失问题（在正区间）

2、只需判断输出是否大于0，计算速度快

3、收敛速度远快于sigmoid和tanh，因为sigmoid和tanh涉及很多expensive的操作

4、提供了NN的稀疏表达能力

缺点：

1、输出非0均值，收敛慢

2、Dead Relu问题：某些神经元可能永远不会被激活，导致相应的参数不能被更新

3.4 Leaky ReLU

TensoFlow API:tf.nn.leaky_relu(x)
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-cTF4hCbU-1661261830785)(attachment:image.png)]

理论上来讲，Leaky ReLU有ReLU的所有优点，外加不会有Dead ReLU问题，但是在实际操作当
中，并没有完全证明Leaky ReLU总是好于ReLU。

3.5 softmax

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-3guKOd2U-1661261830785)(attachment:image.png)]

TensorFlow API: tf.nn.softmax

对神经网络全连接层输出进行变换，使其服从概率分布，即每个值都位于[0,1]区间且和为1。

3.6 建议

对于初学者建议：

1、首先Relu函数

2、学习率设置较小值

3、输出特征标准化，即让输入特征满足以0为均值，1为标准差的正态分布

4、初始化问题：初始参数中心化，即让随机生成的参数满足以0为均值，[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-aiKRxch3-1661261830786)(attachment:image.png)]为标准差的正态分布

4 损失函数

损失函数（loss）：预测值（y）与已知答案（y_）的差距

NN优化目标：loss最小
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-4BgZhmJY-1661261830786)(attachment:image.png)]

4.1 均方误差损失函数

均方误差（Mean Square Error）是回归问题最常用的损失函数。回归问题解决的是对具体数值的预测，比如房价预测、效率预测等。这些问题不是一个事先定义好的类别，而是一个任意实数。均方误差定义如下：
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-tjd2Oyh0-1661261830786)(attachment:image.png)]

loss_mse = tf.reduce_mean(tf.square(y - y_))

TensorFlow API:tf.keras.losses.MSE

tf.keras.losses.MSE(
    y_true, y_pred
)

y_true = tf.constant([0.5, 0.8])
y_pred = tf.constant([1.0,1.0])
print(tf.keras.losses.MSE(y_true, y_pred))

tf.Tensor(0.145, shape=(), dtype=float32)

print(tf.reduce_mean(tf.square(y_true - y_pred)))

tf.Tensor(0.145, shape=(), dtype=float32)

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-RArKMlZC-1661261830787)(attachment:image.png)]

#p19_mse.py
import tensorflow as tf
import numpy as np

SEED=23455

rdm = np.random.RandomState(seed=SEED)# 生成[0,1)之间的随机数
x = rdm.rand(32, 2)
y_ = [[x1 + x2 + (rdm.rand() / 10.0 - 0.05)] for (x1, x2) in x] # 生成噪声[0,1)/10=[0,0.1); [0,0.1)-0.05=[-0.05,0.05)

x = tf.cast(x, dtype=tf.float32)

w1 = tf.Variable(tf.random.normal([2,1], stddev=1, seed=1))

epoch = 15000
lr = 0.002

for epoch in range(epoch):
    with tf.GradientTape() as tape:
        y = tf.matmul(x, w1)
        loss_mse = tf.keras.losses.MSE(y_, y)
    grads = tape.gradient(loss_mse, w1)
    w1.assign_sub(lr*grads)
    
    if epoch % 500 == 0:
        print("After %d training steps,w1 is " % (epoch))
        print(w1.numpy(), "\n")
        
print("Final w1 is: ", w1.numpy())

After 0 training steps,w1 is 
[[-1.5136462 ]
 [ 0.35048607]] 

After 500 training steps,w1 is 
[[1.0024375]
 [0.9964705]] 

After 1000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 1500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 2000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 2500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 3000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 3500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 4000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 4500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 5000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 5500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 6000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 6500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 7000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 7500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 8000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 8500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 9000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 9500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 10000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 10500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 11000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 11500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 12000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 12500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 13000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 13500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 14000 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

After 14500 training steps,w1 is 
[[1.0043048]
 [0.9948319]] 

Final w1 is:  [[1.0043048]
 [0.9948319]]

4.2 交叉熵损失函数

交叉熵（Cross Entropy）表征两个概率之间的距离， =交叉熵越小说明二者分布越接近，是分类问题中使用较广泛的损失函数

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-afUKQmXf-1661261830787)(attachment:image.png)]

其中y_代表数据真实值， y代表神经网络的预测值

对于多分类问题，神经网络的输出一般不是概率分布，因此需要引入softmax层，使得输出服从概
率分布， tensorflow中可计算交叉熵损失函数的API:

**TensorFlow API:tf.keras.losses.categorical_crossentropy **

**TensorFlow API: tf.nn.softmax_cross_entropy_with_logits **

**TensorFlow API: tf.nn.sparse_softmax_cross_entropy_with_logits *

tf.keras.losses.categorical_crossentropy

tf.keras.losses.categorical_crossentropy(
    y_true, y_pred, from_logits=False, label_smoothing=0
)

**功能：**计算交叉熵

**等价API：**tf.losses.categorical_crossentropy

参数：

- y_true: 真实值.
- y_pred:预测值
- from_logits:y_pred是否为logits张量.
- label_smoothing: [0,1]之间的小数.

y_true = [1, 0, 0]
y_pred1 = [0.5, 0.4, 0.1]
y_pred2 = [0.8, 0.1, 0.1]
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred1))
print(tf.keras.losses.categorical_crossentropy(y_true, y_pred2))

tf.Tensor(0.6931472, shape=(), dtype=float32)
tf.Tensor(0.22314353, shape=(), dtype=float32)

#等价实现
print(-tf.reduce_sum(y_true* tf.math.log(y_pred1)))
print(-tf.reduce_sum(y_true * tf.math.log(y_pred2)))

tf.Tensor(0.6931472, shape=(), dtype=float32)
tf.Tensor(0.22314353, shape=(), dtype=float32)

tf.nn.softmax_cross_entropy_with_logits

tf.nn.softmax_cross_entropy_with_logits(

    labels, logits, axis=-1, name=None
    
)

**功能：**logits经过softmax之后，与labels进行交叉熵计算

在机器学习中，对于多分类问题，把未经softmax归一化的向量值称为logits。logits经过softmax
层后，输出服从概率分布的向量（来源）

参数：

labels: 在类别这一维度上，每个向量应服从有效的概率分布. 例如，在labels的shape为[batch_size, num_classes]的情况下，labels[i]应服从概率分布.
logits: 每个类别的激活值，通常是线性层的输出. 激活值需要经过softmax归一化.
axis: 类别所在维度，默认是-1，即最后一个维度.

**返回：**softmax交叉熵损失值

labels = [[1.0, 0.0, 0.0], [0.0, 1.0, 0.0]]
logits = [[4.0, 2.0, 1.0], [0.0, 5.0, 1.0]]
print(tf.nn.softmax_cross_entropy_with_logits(labels, logits))

tf.Tensor([0.16984604 0.02474492], shape=(2,), dtype=float32)

y_ = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 0, 0], [0, 1, 0]])
y = np.array([[12, 3, 2], [3, 10, 1], [1, 2, 5], [4, 6.5, 1.2], [3, 6, 1]])
y_pro = tf.nn.softmax(y)
loss_ce1 = tf.losses.categorical_crossentropy(y_,y_pro)
loss_ce2 = tf.nn.softmax_cross_entropy_with_logits(y_, y)

print('分步计算的结果:\n', loss_ce1)
print('结合计算的结果:\n', loss_ce2)

分步计算的结果:
 tf.Tensor(
[1.68795487e-04 1.03475622e-03 6.58839038e-02 2.58349207e+00
 5.49852354e-02], shape=(5,), dtype=float64)
结合计算的结果:
 tf.Tensor(
[1.68795487e-04 1.03475622e-03 6.58839038e-02 2.58349207e+00
 5.49852354e-02], shape=(5,), dtype=float64)

# 等价实现
print(-tf.reduce_sum(labels * tf.math.log(tf.nn.softmax(logits)), axis=1))

tf.Tensor([0.16984606 0.02474495], shape=(2,), dtype=float32)

tf.nn.sparse_softmax_cross_entropy_with_logits

tf.nn.sparse_softmax_cross_entropy_with_logits(

    labels, logits, name=None
    
)

**功能：**labels经过one-hot编码，logits经过softmax，两者进行交叉熵计算. 通常labels的shape为[batch_size]，logits的shape为[batch_size, num_classes]. sparse可理解为对labels进行稀疏化处理(即进行one-hot编码)

参数：

labels: 标签的索引值.

logits: 每个类别的激活值，通常是线性层的输出. 激活值需要经过softmax归一化.

**返回：**softmax交叉熵损失值.

例子：（下例中先对labels进行one-hot编码为[[1,0,0], [0,1,0]]，logits经过softmax变为[[0.844，
0.114，0.042], [0.007,0.976,0.018]]，两者再进行交叉熵运算）

labels = [0,1]
logits = [[4.0, 2.0, 1.0], [0.0, 5.0, 1.0]]
print(tf.nn.sparse_softmax_cross_entropy_with_logits(labels, logits))

tf.Tensor([0.16984604 0.02474492], shape=(2,), dtype=float32)

# 等价实现
print(-tf.reduce_sum(tf.one_hot(labels, tf.shape(logits)[1]) * tf.math.log(tf.nn.softmax(logits)), axis=1))

tf.Tensor([0.16984606 0.02474495], shape=(2,), dtype=float32)

4.3 自定义损失函数

根据具体任务和目的，可设计不同的损失函数。从老师课件和讲解中对于酸奶预测损失函数的设计，我们可以得知损失函数的定义能极大影响模型预测效果。好的损失函数设计对于模型训练能够起到良好的引导作用。

例如，我们可以看目标检测中的多种损失函数。目标检测的主要功能是定位和识别，损失函数的功
能主要就是让定位更精确，识别准确率更高。目标检测任务的损失函数由分类损失（Classificition
Loss）和回归损失（Bounding Box Regeression Loss）两部分构成。近几年来回归损失主要有
Smooth L1 Loss(2015), IoU Loss(2016 ACM), GIoU Loss(2019 CVPR), DIoU Loss & CIoU Loss(2020
AAAI)等，分类损失有交叉熵、softmax loss、logloss、focal loss等。在此由于篇幅原因不细究，有兴
趣的同学可自行研究。主要是给大家一个感性的认知：需要针对特定的背景、具体的任务设计损失函
数

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ie4iO2X0-1661261830787)(attachment:image.png)]

#p20_custom.py
import tensorflow as tf
import numpy as np

SEED = 23455
COST = 1
PROFIT = 99

rdm = np.random.RandomState(SEED)
x = rdm.rand(32, 2)
y_ = [[x1 + x2 + (rdm.rand() / 10.0 - 0.05)] for (x1, x2) in x]  # 生成噪声[0,1)/10=[0,0.1); [0,0.1)-0.05=[-0.05,0.05)
x = tf.cast(x, dtype=tf.float32)

w1 = tf.Variable(tf.random.normal([2, 1], stddev=1, seed=1))

epoch = 10000
lr = 0.002

for epoch in range(epoch):
    with tf.GradientTape() as tape:
        y = tf.matmul(x, w1)
        loss = tf.reduce_sum(tf.where(tf.greater(y, y_), (y - y_) * COST, (y_ - y) * PROFIT))

    grads = tape.gradient(loss, w1)
    w1.assign_sub(lr * grads)

    if epoch % 500 == 0:
        print("After %d training steps,w1 is " % (epoch))
        print(w1.numpy(), "\n")
print("Final w1 is: ", w1.numpy())

After 0 training steps,w1 is 
[[3.4457052]
 [3.2526264]] 

After 500 training steps,w1 is 
[[1.1493957]
 [1.0595962]] 

After 1000 training steps,w1 is 
[[1.1397758]
 [1.09088  ]] 

After 1500 training steps,w1 is 
[[1.1301556]
 [1.1221637]] 

After 2000 training steps,w1 is 
[[1.1791688]
 [1.1647406]] 

After 2500 training steps,w1 is 
[[1.1487305]
 [1.019554 ]] 

After 3000 training steps,w1 is 
[[1.1391103]
 [1.0508376]] 

After 3500 training steps,w1 is 
[[1.1294906]
 [1.0821217]] 

After 4000 training steps,w1 is 
[[1.1198705]
 [1.1134055]] 

After 4500 training steps,w1 is 
[[1.1688839]
 [1.1559825]] 

After 5000 training steps,w1 is 
[[1.1384457]
 [1.0107961]] 

After 5500 training steps,w1 is 
[[1.1288261]
 [1.0420803]] 

After 6000 training steps,w1 is 
[[1.1192057]
 [1.0733637]] 

After 6500 training steps,w1 is 
[[1.1095855]
 [1.1046473]] 

After 7000 training steps,w1 is 
[[1.1585989]
 [1.1472243]] 

After 7500 training steps,w1 is 
[[1.1489792]
 [1.1785084]] 

After 8000 training steps,w1 is 
[[1.1185408]
 [1.0333217]] 

After 8500 training steps,w1 is 
[[1.1089209]
 [1.0646057]] 

After 9000 training steps,w1 is 
[[1.1579342]
 [1.1071826]] 

After 9500 training steps,w1 is 
[[1.1483142]
 [1.1384665]] 

Final w1 is:  [[1.1289538]
 [1.0160426]]

5 欠拟合与过拟合

欠拟合的解决方法：

增加输入特征项
增加网络参数
减少正则化参数

过拟合的解决方法：

数据清洗
增大训练集
采用正则化
增大正则化参数

正则化缓解过拟合

正则化在损失函数中引入模型复杂度指标，利用给W加权值，弱化了训练数据的噪声（一般不正则化）
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-TgahcSyN-1661261830788)(attachment:image.png)]

正则化的选择：

L1正则化大概率会使很多参数变为0，因此该方法可通过稀疏参数，即减少参数的数量，降低复杂度
L2正则化会使参数很接近0但不为0，因此该方法可通过减小参数的值来降低复杂度

# 导入所需模块
import tensorflow as tf
from matplotlib import pyplot as plt
import numpy as np
import pandas as pd

# 读入数据/标签 生成x_train y_train
df = pd.read_csv('E:\BaiduNetdiskDownload\中国大学MOOCTF笔记2.1共享给所有学习者\class2\dot.csv')
x_data = np.array(df[['x1', 'x2']])
y_data = np.array(df['y_c'])

x_train = x_data
y_train = y_data.reshape(-1, 1)

Y_c = [['red' if y else 'blue'] for y in y_train]

# 转换x的数据类型，否则后面矩阵相乘时会因数据类型问题报错
x_train = tf.cast(x_train, dtype = tf.float32)
y_train = tf.cast(y_train, tf.float32)

# from_tensor_slices函数切分传入的张量的第一个维度，生成相应的数据集，使输入特征和标签值一一对应
train_db = tf.data.Dataset.from_tensor_slices((x_train, y_train)).batch(32)

# 生成神经网络的参数，输入层为4个神经元，隐藏层为32个神经元，2层隐藏层，输出层为3个神经元
# 用tf.Variable()保证参数可训练
w1 = tf.Variable(tf.random.normal([2,11]), dtype = tf.float32)
b1 = tf.Variable(tf.constant(0.01, shape=[11]))

w2 = tf.Variable(tf.random.normal([11, 1]), dtype=tf.float32)
b2 = tf.Variable(tf.constant(0.01, shape=[1]))

lr = 0.005  # 学习率为
epoch = 800  # 循环轮数

## 训练部分
for epoch in range(epoch):
    for step, (x_train, y_train) in enumerate(train_db):
        with tf.GradientTape() as tape: # 记录梯度信息

            h1 = tf.matmul(x_train, w1) + b1  # 记录神经网络乘加运算
            h1 = tf.nn.relu(h1)
            y = tf.matmul(h1, w2) + b2
            
            
            # 采用均方误差损失函数mse = mean(sum(y-out)^2)
            losses_mse = tf.reduce_mean(tf.square(y_train - y))
            
            #添加l2正则化
            loss_regularization = []
            loss_regularization.append(tf.nn.l2_loss(w1))
            loss_regularization.append(tf.nn.l2_loss(w2))
            
            loss_regularization = tf.reduce_sum(loss_regularization)
            loss = losses_mse + 0.03 * loss_regularization
            
        #计算loss对各个参数的梯度
        variables = [w1, b1, w2, b2]
        grads = tape.gradient(loss, variables)
        
        # 实现梯度更新
        # w1 = w1 - lr * w1_grad
        w1.assign_sub(lr * grads[0])
        b1.assign_sub(lr * grads[1])
        w2.assign_sub(lr * grads[2])
        b2.assign_sub(lr * grads[3])
        
    # 每200个epoch，打印loss信息
    if epoch % 40 == 0:
        print('epoch:', epoch, 'loss:', float(loss))
        
        
# 预测部分
print("*******predict*******")
# xx在-3到3之间以步长为0.01，yy在-3到3之间以步长0.01,生成间隔数值点
xx,yy = np.mgrid[-3:3:.1, -3:3:.1]

# 将xx, yy拉直，并合并配对为二维张量，生成二维坐标点
grid = np.c_[xx.ravel(), yy.ravel()]
grid = tf.cast(grid, tf.float32)
# 将网格坐标点喂入神经网络，进行预测，probs为输出
probs = []
for x_predict in grid:
    # 使用训练好的参数进行预测
    h1 = tf.matmul([x_predict], w1) + b1
    h1 = tf.nn.relu(h1)
    y = tf.matmul(h1, w2) + b2 # y为预测结果
    probs.append(y)
    
# 取第0列给x1，取第1列给x2
x1 = x_data[:, 0]
x2 = x_data[:, 1]
# probs的shape调整成xx的样子
probs = np.array(probs).reshape(xx.shape)
plt.scatter(x1, x2, color=np.squeeze(Y_c))
# 把坐标xx yy和对应的值probs放入contour函数，给probs值为0.5的所有点上色  plt.show()后 显示的是红蓝点的分界线
plt.contour(xx, yy, probs, levels=[.5])
plt.show()

epoch: 0 loss: 5.472182750701904
epoch: 40 loss: 0.4817494750022888
epoch: 80 loss: 0.36840903759002686
epoch: 120 loss: 0.31256160140037537
epoch: 160 loss: 0.27391085028648376
epoch: 200 loss: 0.24360911548137665
epoch: 240 loss: 0.21865178644657135
epoch: 280 loss: 0.1976892352104187
epoch: 320 loss: 0.17977286875247955
epoch: 360 loss: 0.16461075842380524
epoch: 400 loss: 0.1515084207057953
epoch: 440 loss: 0.14020243287086487
epoch: 480 loss: 0.13045549392700195
epoch: 520 loss: 0.12212586402893066
epoch: 560 loss: 0.11479806900024414
epoch: 600 loss: 0.10838881880044937
epoch: 640 loss: 0.10294126719236374
epoch: 680 loss: 0.09831778705120087
epoch: 720 loss: 0.094314344227314
epoch: 760 loss: 0.09065239131450653
*******predict*******

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-ExiijTY0-1661261830788)(output_42_1.png)]

tensorflow_course2