在pytorch中,自制数据集和测试数据集似乎耗尽了所有RAM

如何解决在pytorch中,自制数据集和测试数据集似乎耗尽了所有RAM

在pytorch中,自制数据集和测试数据集似乎耗尽了所有RAM

我是 pytorch 的新手,我在 MNIST 上的 pytorch 中编写了一个 ResNet 程序用于实验。

如果我使用如下的数据加载器,那就没问题了:

import torch as pt
from torch.utils.data import DataLoader,TensorDataset
import torchvision as ptv

mnist_train = ptv.datasets.MNIST(ROOT_DIR,train=True,transform=ptv.transforms.ToTensor(),download=False)
dl = pt.utils.data.DataLoader(dataset=mnist_train,batch_size=BATCH_SIZE,shuffle=True,drop_last=True)

如果我使用如下自制数据集在每次迭代时使用验证集,程序将耗尽我所有的 RAM。测试集不是在每次迭代中使用,而是在最后评估模型。

mnist_test = ptv.datasets.MNIST(ROOT_DIR,train=False,download=False)
M_TEST,PIC_H,PIC_W = mnist_test.data.shape

x_test = mnist_test.data.double() / 255.
y_test = mnist_test.targets

a = pt.randperm(M_TEST)  # ATTENTION pt.randperm
x_test = x_test[a]
y_test = y_test[a]
VAL_RATE = 0.1
M_VAL = int(np.ceil(M_TEST * VAL_RATE))
M_TEST -= M_VAL

x_test,x_val = pt.split(x_test,(M_TEST,M_VAL))
y_test,y_val = pt.split(y_test,M_VAL))

x_test = x_test.view(-1,1,PIC_W).double()
x_val = x_val.view(-1,PIC_W).double()

dl_test = DataLoader(TensorDataset(x_test,y_test),batch_size=BATCH_SIZE)

def acc(ht,yt):
    return (pt.argmax(ht,1) == yt.long()).double().mean()

# in iteration:
    for epoch in range(N_EPOCHS):

        for i,(bx,by) in enumerate(dl):
            model.train(True)
            optim.zero_grad()
            bx = bx.view(-1,PIC_W).double()
            ht = model(bx)
            cost = criterion(ht,by)
            cost.backward()
            optim.step()
            model.train(False)
            accv = acc(ht,by)
            ht_val = model(x_val)
            val_cost = criterion(ht_val,y_val)
            val_acc = acc(ht_val,y_val)

所以我怀疑只有 ptv.datasets.MNIST 和 pt.utils.data.DataLoader 可用,所以我在每次迭代时删除了我自制验证集的使用;并且移除后内存使用正常。但是即使我只使用 ptv.datasets.MNIST 和 pt.utils.data.DataLoader ,测试进度仍然耗尽了我所有的 RAM:

mnist_test = ptv.datasets.MNIST(ROOT_DIR,download=False)
dl_test = pt.utils.data.DataLoader(dataset=mnist_test,shuffle=False,drop_last=True)
test_cost_avg = 0.
test_acc_avg = 0.
GROUP = int(np.ceil(M_TEST / BATCH_SIZE / 10))
for i,by) in enumerate(dl_test):
    bx = bx.view(-1,PIC_W).double()
    ht = model(bx)
    test_cost_avg += criterion(ht,by)
    test_acc_avg += acc(ht,by)
    if i % GROUP == 0:
        print(f'Testing # {i + 1}')
if i % GROUP != 0:
    print(f'Testing # {i + 1}')
test_cost_avg /= i + 1
test_acc_avg /= i + 1
print(f'Tested: cost = {test_cost_avg},acc = {test_acc_avg}')
print('Over')

请帮帮我。非常感谢!

更新:

我怀疑我的模型有问题,因为我在 pytorchvision 的 MNIST 自制数据集上有一个简单的 CNN 模型,没有这个 RAM 耗尽问题。所以我将我的模型粘贴到这个问题中,如下仅供参考:

def my_conv(in_side,in_ch,out_ch,kernel,stride,padding='same'):
    if 'same' == padding:
        ps = kernel - 1
        padding = ps // 2
    else:
        padding = 0
    print(padding)  # tmp
    return pt.nn.Conv2d(in_ch,kernel_size=kernel,stride=stride,padding=padding)


class MyResnetBlock(pt.nn.Module):

    def __init__(self,residual,in_side,kernel=3,stride=1,**kwargs):
        super().__init__(**kwargs)
        self.residual = residual
        self.in_side = in_side
        self.in_ch = in_ch
        self.out_ch = out_ch
        self.kernel = kernel
        self.stride = stride

        self.conv1 = my_conv(in_side,stride)
        self.bn1 = pt.nn.BatchNorm2d(out_ch)
        self.relu1 = pt.nn.ReLU()

        self.conv2 = my_conv(np.ceil(in_side / stride),1)
        self.bn2 = pt.nn.BatchNorm2d(out_ch)
        self.relu2 = pt.nn.ReLU()

        if residual:
            self.conv_down = my_conv(in_side,stride)

    def forward(self,input):
        x = input
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu1(x)

        x = self.conv2(x)
        x = self.bn2(x)

        if self.residual:
            res = self.conv_down(input)
        else:
            res = input
        x += res

        x = self.relu2(x)
        return x


class MyResnetByPt(pt.nn.Module):

    def __init__(self,blocks_spec_list,init_in_ch,init_out_ch,**kwargs):
        super().__init__(**kwargs)

        self.conv1 = my_conv(in_side,3,1)
        in_ch = out_ch = init_out_ch

        blocks = []
        for block_id,n_blocks in enumerate(blocks_spec_list):
            for layer_id in range(n_blocks):
                if layer_id == 0:
                    if block_id != 0:
                        out_ch *= 2
                    block = MyResnetBlock(True,2)
                    in_ch = out_ch
                    in_side = int(np.ceil(in_side / 2))
                else:
                    block = MyResnetBlock(False,1)
                blocks.append(block)
        self.blocks = pt.nn.Sequential(*blocks)
        self.final_ch = out_ch
        self.avg_pool = pt.nn.AvgPool2d(kernel_size=(in_side,in_side),stride=(1,1),padding=(0,0))
        self.fc = pt.nn.Linear(out_ch,N_CLS)

    def forward(self,input):
        x = input

        x = self.conv1(x)

        x = self.blocks(x)

        x = self.avg_pool(x)
        x = x.view(-1,self.final_ch)
        x = self.fc(x)
        return x


model = MyResnetByPt([2,2,2],16)
model = model.double()

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 <select id="xxx"> SELECT di.id, di.name, di.work_type, di.updated... <where> <if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 <property name="dynamic.classpath" value="tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-