为什么这个神经网络在 MNIST 上表现不佳? 编辑

如何解决为什么这个神经网络在 MNIST 上表现不佳? 编辑

嗨,我正在 pytorch 中构建一个神经网络来对 MNIST 进行分类,并且在我的一生中,我似乎无法找出为什么这个网络的准确率不会超过 7%。任何指导都会很好。

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from keras.datasets import mnist
from keras.utils.np_utils import to_categorical
import numpy as np
from sklearn.metrics import precision_score,recall_score,f1_score,accuracy_score,confusion_matrix

(X_train,Y_train),(X_test,Y_test) = mnist.load_data()

X_train = X_train.astype("float32")/255
X_test = X_test.astype("float32")/255


X_train = X_train.reshape(X_train.shape[0],(X_train.shape[1] * X_train.shape[2]));
X_test = X_test.reshape(X_test.shape[0],(X_test.shape[1] * X_test.shape[2]));

class Net(torch.nn.Module):
  def __init__(self):
    super(Net,self).__init__()
    self.lin_1 = nn.Linear(784,128)
    self.lin_2 = nn.Linear(128,64)
    self.lin_3 = nn.Linear(64,10)

  def forward(self,x) :
    x = self.lin_1(x)
    x = torch.relu(x)
    x = self.lin_2(x)
    x = torch.relu(x)
    x = self.lin_3(x)
    x = torch.softmax(x,dim=0)
    return x

net = Net();
loss = torch.nn.CrossEntropyLoss();
optimizer = torch.optim.SGD(net.parameters(),lr = 0.01);

X_train = torch.from_numpy(X_train);
X_test = torch.from_numpy(X_test);
y_train = torch.from_numpy(Y_train);
y_test = torch.from_numpy(Y_test)

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu");
X_train.to(device);
X_test.to(device);
y_train.to(device);
y_test.to(device);
net.to(device);
loss.to(device);

y_train = y_train.type(torch.long)
y_test = y_test.type(torch.long)

net.train()
for epoch in range(10):
  #pred = torch.max(net(X_train),1);
  pred = net(X_train.to(device));
  
  train_loss = loss(pred,y_train.to(device));
  optimizer.zero_grad()
  train_loss.backward()
  optimizer.step()

net.eval()

pred = torch.max(net(X_test.to(device)),1)[1];
print('The accuracy for pytorch is ',accuracy_score(y_test.cpu().numpy(),pred.cpu().numpy()));

我觉得我必须以某种方式转换数据。这就是为什么我将训练和测试数据除以 255,网络除了输入的浮点数和输出的长数。

这是我在没有 pytorch 的情况下制作的 numpy 版本

from keras.datasets import mnist
from keras.utils.np_utils import to_categorical
(X_train,(X_test.shape[1] * X_test.shape[2]));

Y_train = to_categorical(Y_train);
Y_test = to_categorical(Y_test)
import numpy as np

print(Y_test.shape)

class DNN():
  def __init__(self,sizes,epochs=10,lr = 0.01):
    self.sizes = sizes
    self.epochs = epochs
    self.lr = lr
    self.params = self.initialization();

  def ReLu(self,x,derivative=False):
    if derivative:
      return 1. * (x > 0)
    else:
      return x * (x > 0)

  def softmax(self,derivative=False):
        # Numerically stable with large exponentials
        exps = np.exp(x - x.max())
        if derivative:
            return exps / np.sum(exps,axis=0) * (1 - exps / np.sum(exps,axis=0))
        return exps / np.sum(exps,axis=0)

  def initialization(self):
        # number of nodes in each layer
        input_layer=self.sizes[0]
        hidden_1=self.sizes[1]
        hidden_2=self.sizes[2]
        output_layer=self.sizes[3]

        params = {
            "W1":np.random.randn(hidden_1,input_layer) * np.sqrt(1. / hidden_1),"W2":np.random.randn(hidden_2,hidden_1) * np.sqrt(1. / hidden_2),"W3":np.random.randn(output_layer,hidden_2) * np.sqrt(1. / output_layer)
        }

        return params
  def forward (self,X_train):
    
    self.params["X0"] = X_train;
    
    self.params["Z1"] = np.dot(self.params["W1"],self.params["X0"])
    self.params['X1'] = self.ReLu(self.params["Z1"])

    self.params['Z2'] = np.dot(self.params["W2"],self.params["X1"])
    self.params["X2"] = self.ReLu(self.params["Z2"])

    self.params["Z3"] = np.dot(self.params["W3"],self.params["X2"])
    self.params["X3"] = self.softmax(self.params["Z3"])

    return self.params["X3"]
  
  def backpropagation (self,Y_train,output):

    update = {};

    error = 2 * (output - Y_train) / output.shape[0] * self.softmax(self.params["Z3"],derivative=True)
    update["W3"] = np.outer(error,self.params["X2"])

    error = np.dot(self.params["W3"].T,error) * self.ReLu(self.params["Z2"],derivative=True)
    update["W2"] = np.outer(error,self.params["X1"])

    error = np.dot(self.params["W2"].T,error) * self.ReLu(self.params["Z1"],derivative=True)
    update["W1"] = np.outer(error,self.params["X0"])

    return update

  def updateParams (self,update):
    for key,value in update.items():
      #print(key)
      self.params[key] -= self.lr * value

  def test_accuracy(self,X_test,Y_train):
    predictions = []
    for i in range(len(X_test)):
      output = self.forward(X_test[i])
      pred = np.argmax(output)
      predictions.append(pred == np.argmax(Y_train[i]))
    
    
    return np.mean(predictions)


  def train(self,X_train,Y_train):
        for epoch in range(self.epochs):
            print("epoch ",epoch)
            for i in range(len(X_train)):
                output = self.forward(X_train[i])
                update = self.backpropagation(Y_train[i],output)
                self.updateParams(update)

dnn = DNN(sizes=[784,200,50,10],epochs=10)
dnn.train(X_train,Y_train)

print("The accuracy of the numpy network on the test dataset is ",dnn.test_accuracy(X_test,Y_test))

解决方法

好吧,我可以立即看出您提供的代码存在一些问题:

  1. 请检查 documentation for PyTorch's cross entropy loss function。如果您阅读它,您会注意到 torch.nn.CrossEntropyLoss 在内部执行 softmax 函数。这意味着如果您使用 torch.softmax,您不应该真正使用另一个 nn.CrossEntropyLoss 作为输出激活。如果出于某种原因你想在输出层使用 softmax,你应该考虑使用 nn.NLLLoss 代替。如果您查看我在下面发布的图片,只需删除 x = torch.softmax(x,dim=0) 会导致损失下降,而使用它会导致损失相同(因此,不好)。

  2. 你训练的 epoch 太少了。我尝试用 3,000 次而不是 10 次运行您的代码,最终性能是 0.9028 而不是原始的 0.1038。您还可以看到,与原始实现(第二张图片)相比,损失值下降得更多。

enter image description here

enter image description here

编辑

在查看您的 NumPy 代码后,问题变得更加清晰。我的第二点在本质上仍然成立:您对模型的训练还不够。我有点错误地使用了上面的“纪元”一词,但我真正的意思是“步骤”。

如果您查看您的 NumPy 代码,您会发现有两个 for 循环:外部一个是 epoch 数,内部一个循环遍历训练数据。您显然正在使用单批训练十个时期。这意味着您在整个过程中总共更新模型参数 600,000 次(60,000 个训练样本 * 10 个时期)。对于您的 PyTorch 代码,您将一次性提供整个训练数据并训练十个时期。这意味着您只更新了 10 次参数。

如果您将 PyTorch 代码修改为:

for epoch in range(10):
    net.train()

    for idx,_ in enumerate(X_train):
        prediction = net(X_train[idx].to(device))
        train_loss = loss(prediction.unsqueeze(0),y_train[idx].unsqueeze(0).to(device))

        optimizer.zero_grad()
        train_loss.backward()
        optimizer.step()

    net.eval()
    prediction = torch.max(net(X_test.to(device)),1)[1]
    accuracy = accuracy_score(y_test,cpu().numpy(),prediction.cpu().numpy())
    print(f"Epoch {epoch + 1} test accuracy is {accuracy}.")

然后你会注意到模型只需要两个 epoch 就可以达到 96% 的准确率。

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 <select id="xxx"> SELECT di.id, di.name, di.work_type, di.updated... <where> <if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 <property name="dynamic.classpath" value="tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-