Tensorflow - numpy 梯度检查不起作用 进口渐变模型定义等估计梯度调用一切

如何解决Tensorflow - numpy 梯度检查不起作用 进口渐变模型定义等估计梯度调用一切

我正在尝试通过有限差分法估计函数的梯度: finite difference method for estimating gradient

TLDR:

grad f(x) = [f(x+h)-f(x-h)]/(2h) 对于足够小的 h。

这也用于梯度检查阶段,以检查您可能知道的 AI 中的反向传播。

这是我的网络:

def defineModel():
  global model
  model = Sequential()
  model.add(keras.Input(shape=input_shape))
  model.add(layers.Conv2D(32,kernel_size=(3,3),activation="relu"))
  model.add(layers.MaxPooling2D(pool_size=(2,2)))
  model.add(layers.Conv2D(64,2)))
  model.add( layers.Flatten())
  model.add(layers.Dropout(0.5))
  model.add(layers.Dense(num_classes,activation="softmax"))
  model.build()
  model.summary()

这部分很好,没有错误。我刚刚在这里提到它,所以你对我的模型有所了解。我在 MNIST 上工作,所以一切都非常简单。使用 1 个 epoch 和几行 TF 代码,我达到了 +98% 的准确率,这对于临时模型来说相当不错。

因为我在做对抗训练,所以我想要我的损失相对于输入数据的梯度:

顺便说一下,我使用了平铺的想法:

如果您用 (tile*tile) 尺寸的方形图块覆盖输入图像而没有重叠,您可以假设图块内图像的梯度几乎是恒定的,因此这是一个合理的近似值。但作为调试问题,在我的代码 tile=1 中,我们正在计算像素级梯度。

这就是问题所在,但我不知道在哪里!我控制了 loss(x+h) 的损失,loss(x-h)loss(x) 的损失非常接近,所以我知道这很好。我的 TF 自动反向传播也很好用,我已经测试过了。问题一定是手动梯度的计算方式。

tile=1
h=1e-4 #also tried 1e-5,1e-6 but did not work

#A dummy function to wait
def w():
  print()
  ww=input('wait')
  print()

#This function works fine.
def rel_error(x,y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8,np.abs(x) + np.abs(y))))

#the problem is here. given an index suah is idx,I'm going to manipulate
#x_test[idx] and compute loss gradient with respect to this input
def estimateGrad(idx):
  y=model(np.expand_dims(x_test[idx],axis=0))
  y=np.expand_dims(y_train[idx],axis=0)
  x=np.squeeze(x_test[idx])

  

  cce = tf.keras.losses.CategoricalCrossentropy()
  grad=np.zeros((28,28))
  num=int(28/tile)

  #MNIST pictures are 28*28 pixels. Now grad is 28*28 nd.array
  #and x is input image converted to 28*28 nd.array
  for i in range(num):
    for j in range(num):

      plus=copy.deepcopy(x)
      minus=copy.deepcopy(x)
      #Now plus is x+h and minus is x-h
      plus[i*tile:(i+1)*tile,j*tile:(j+1)*tile]=  plus[i*tile:(i+1)*tile,j*tile:(j+1)*tile]+h
      minus[i*tile:(i+1)*tile,j*tile:(j+1)*tile]=  minus[i*tile:(i+1)*tile,j*tile:(j+1)*tile]-h

       
      plus=np.expand_dims(plus,axis=(0,-1))
      minus=np.expand_dims(minus,-1))

      #Now we pass plus and minus to model prediction in the next two lines
      plus=model(plus)
      minus=model(minus)
      
      #Since I want to find gradient of loss with respect to x,in the next 
      #two lines I will set plus=loss(x+h) and minus=loss(x-h)
      #In other words,here our finction f which we want to cumpute its grads
      #is the loss function

      plus=cce(y,plus).numpy()
      minus=cce(y,minus).numpy()

      #Applying the formola : grad=(loss(x+h)-loss(x-h))/2/h
      grad[i*tile:(i+1)*tile,j*tile:(j+1)*tile]=(plus-minus)/2/h

 #Ok now lets check our grad with TF autograd module
  x= tf.convert_to_tensor(np.expand_dims(x_test[idx],axis=0)) 
  with tf.GradientTape() as tape:
    tape.watch(x) 
    y=model(x)
    y_expanded=np.expand_dims(y_train[idx],axis=0)  
    loss=cce(y_expanded,y)

     
  
  delta=tape.gradient(loss,x)
  delta=delta.numpy()
  delta=np.squeeze(delta)

  #delta is gradients returned by TF via auto-differentiation.
  #We calculate the error and unfortunately its large
  diff=rel_error(grad,delta)

  print('diff ',diff)
  w()
  #problem : diff is very large. it should be less than 1e-4

您可以参考 here 获取我的完整代码。

解决方法

我用我自己的解决方案 gradient 替换了梯度估计代码,代码现在可以工作了。计算错误可能很棘手。正如在底部的直方图(注意对数刻度)上看到的那样,对于大多数像素,相对误差小于 10^-4 但在梯度接近于零的情况下,相对误差会爆炸。 max(rel_err)mean(rel_err) 的问题在于,它们都很容易受到异常像素的干扰。衡量数量级是否最相关的更好衡量标准是所有非零像素的几何平均值和中值。

进口

import tensorflow as tf
import numpy as np
from keras.models import Sequential
from keras.layers import Dense
from tensorflow import keras
from tensorflow.keras import layers
import matplotlib.pyplot as plt
import time
import copy

渐变

# Gradient
def gradient(f,x,h,**kwargs):
    shape = x.shape    
    x = x.ravel()
    out = np.zeros_like(x)
    hs = np.zeros_like(x)
    for i,_ in enumerate(x):
        hs *= 0.
        hs[i] = h
        out[i] = (f((x + hs).reshape(shape),**kwargs) - f((x - hs).reshape(shape),**kwargs)) / (2*h)

    return out.reshape(shape)

模型定义等

不是我的代码。

x_train=y_train=x_test=y_test=None
num_classes = 10
input_shape = (28,28,1)
batch_size= 128
epochs=1
h=1e-4
epsilon=1e-3
tile=1

model=None

def testImage(clean_img,adv_image,truth,pred):

    # image = np.squeeze(x_test[index])
    # # plot the sample
    # fig = plt.figure
    # plt.imshow(image,cmap='gray')
    # plt.show()

    # x= np.expand_dims(x_test[index],axis=0)
    # y=model(x)
    # print('model prediction clean example : ',np.argmax(y))
    # print('the ground truth : ',y_train[index])

    print("clean image : \n")
    fig=plt.figure
    plt.imshow(clean_img,cmap='gray')
    plt.show()

    print("adv image : \n")
    fig=plt.figure
    plt.imshow(adv_image,cmap='gray')
    plt.show()

    print('model prediction was : ',pred)
    print('truth was :',truth)

    print('\n\n')


def prepareDataset():
    global x_train,y_train,x_test,y_test
    (x_train,y_train),(x_test,y_test) = keras.datasets.mnist.load_data()
    x_train = x_train.astype("float32") / 255
    x_test = x_test.astype("float32") / 255
    x_train = np.expand_dims(x_train,-1)
    x_test = np.expand_dims(x_test,-1)
    y_train = keras.utils.to_categorical(y_train,num_classes)
    y_test = keras.utils.to_categorical(y_test,num_classes) 


def defineModel():
    global model
    model = Sequential()
    model.add(keras.Input(shape=input_shape))
    model.add(layers.Conv2D(32,kernel_size=(3,3),activation="relu"))
    model.add(layers.MaxPooling2D(pool_size=(2,2)))
    model.add(layers.Conv2D(64,2)))
    model.add( layers.Flatten())
    model.add(layers.Dropout(0.5))
    model.add(layers.Dense(num_classes,activation="softmax"))
    model.build()
    model.summary()


def trainModel():
    global model
    model.compile(loss="categorical_crossentropy",optimizer="adam",metrics=["accuracy"])
    model.fit(x_train,batch_size=batch_size,epochs=epochs,validation_split=0.1)


def evalModel():
    score = model.evaluate(x_test,y_test,verbose=0)
    print("Test loss:",score[0])
    print("Test accuracy:",score[1])

def fgsm(epsilon,index):
    x = tf.convert_to_tensor(np.expand_dims(x_test[index],axis=0)) 
    with tf.GradientTape() as tape:
        tape.watch(x)

        cce = tf.keras.losses.CategoricalCrossentropy()
        y = model(x)
        y_expanded = np.expand_dims(y_train[index],axis=0)
        #y=np.squeeze(tf.transpose(y))
        loss = cce(y_expanded,y)
        delta = tape.gradient(loss,x)   
        perturbation=epsilon*tf.math.sign(delta)
        return perturbation


def w():
    print()
    ww=input('wait')
    print()


def rel_error(x,y):
    """ returns relative error """
    return np.max(np.abs(x - y) / (np.maximum(1e-8,np.abs(x) + np.abs(y))))

估计梯度

def estimateGrad(idx):

    cce = tf.keras.losses.CategoricalCrossentropy()

    x = x_test[idx]
    y = y_test[idx]

    def f(x,y):
        y_pred = model(np.expand_dims(x,axis=0))
        return cce(y,y_pred[0])

    grad = gradient(f,1e-3,y=y)

    tfx = tf.convert_to_tensor(np.expand_dims(x,axis=0)) 
    with tf.GradientTape() as tape:
        tape.watch(tfx) 
        y_pred = model(tfx)
        loss=cce(y,y_pred[0])

        delta = tape.gradient(loss,tfx)
        delta = delta.numpy()
        delta = np.squeeze(delta)

    return grad,delta

调用一切

所以从技术上讲这是有效的。不幸的是,h < 1/1000 遇到了数值问题,所以这是我们可以通过有限差异实现的最佳结果。

prepareDataset()
trainModel()
grad,delta = estimateGrad(2)
abs_err = abs(delta - grad).max()
rel_err = (abs(delta - grad / np.clip(delta,1e-8,np.infty)))

print('Absolute Error: ',abs_err)
print('Median Relative Error: ',np.median(rel_err))
print('Mean Relative Error: ',np.mean(rel_err))
print('Geometric Mean Relative Error: ',gmean(rel_err[rel_err > 0].ravel()))
print('Max Relative Error: ',np.max(rel_err))   

fig,ax = plt.subplots(1,4,figsize=(20,5))

ax[0].imshow(grad)
ax[0].set_title('Discrete gradient')

ax[1].imshow(delta)
ax[1].set_title('Analytic gradient')

ax[1].set_title('Relative errorr')
ax[2].imshow(rel_err + 1)

ax[3].set_title('Histogram log-relative error')
logbins = np.geomspace(1e-8,rel_err.max() + 1,8)
ax[3].hist(rel_err,bins=logbins)
ax[3].set_xscale('log')
plt.show()

输出:

Absolute Error:  0.0001446546
Median Relative Error:  8.748577e-06
Mean Relative Error:  1781.9397
Geometric Mean Relative Error:  0.009761388
Max Relative Error:  53676.195

Error analysis

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-