Pytorch从张量文件中读取张量来自磁盘的流训练

如何解决Pytorch从张量文件中读取张量来自磁盘的流训练

我有一些非常大的输入张量,在构建它们时遇到了内存问题,因此我将它们一张一张地读到.pt文件中。当我运行生成并保存文件的脚本时,文件越来越大,因此我假设张量正确保存。这是该代码:

with open(a_sync_save,"ab") as f:
     print("saved")
     torch.save(torch.unsqueeze(torch.cat(tensors,dim=0),f)

我想一次从文件中读取一定数量的这些张量,因为我不想再次遇到内存问题。当我尝试读取保存到该张量的每个张量时我只能设法获得第一个张量。

with open(a_sync_save,"rb") as f:
    for tensor in torch.load(f):
        print(tensor.shape)

这里的输出是第一个张量的形状,然后剧烈退出。

解决方法

这是我用来回答此问题的一些代码。很多都是我正在做的事情,但是它的本质可以供其他遇到与我一样的问题的人使用。

def stream_training(filepath,epochs=100):
    """
    :param filepath: file path of pkl file
    :param epochs: number of epochs to run
    """
    def training(train_dataloader,model_obj,criterion,optimizer):
        for j,data in enumerate(train_dataloader,start=0):
            # get the inputs; data is a list of [inputs,labels]
            inputs,labels = data
            inputs,labels = inputs.cuda(),labels.cuda()
            outputs = model_obj(inputs.float())
            outputs = torch.flatten(outputs)
            loss = criterion(outputs,labels.float())
            print(loss)
            # zero the parameter gradients
            optimizer.zero_grad()
            loss.backward()
            torch.nn.utils.clip_grad_norm_(model_obj.parameters(),max_norm=1)
            optimizer.step()

    tensors = []
    expected_values = []
    model= Model(1000,1,256,1)
    tea.cuda()
    criterion = nn.BCELoss()
    optimizer = optim.Adam(model.parameters(),lr=0.00001,betas=(0.9,0.99999),eps=1e-08,weight_decay=0.001,amsgrad=True)
    for i in range(epochs):
        with (open(filepath,'rb')) as openfile:
            while True:
                try:
                    data_list = pickle.load(openfile)
                    tensors.append(data_list[0])
                    expected_values.append(data_list[1])
                    if len(tensors) % BATCH_SIZE == 0:
                        tensors = torch.cat(tensors,dim=0)
                        tensors = torch.reshape(tensors,(tensors.shape[0],tensors.shape[1],-1))
                        train_loader = make_dataset(tensors,expected_values) # makes a dataloader for the batch that comes in
                        training(train_loader,model,optimizer)  #Performs forward and back prop
                        tensors = [] # washes out the batch to conserve memory on my computer.
                        expected_values = []
                except EOFError:
                    print("This file has finished training")
                    break

这是有趣的模型。

class Model(nn.Module):
    def __init__(self,input_size,output_size,hidden_dim,n_layers):
        super(Model,self).__init__()
        # dimensions
        self.hidden_dim = hidden_dim
        self.n_layers = n_layers

        #Define the layers
        #GRU
        self.gru = nn.GRU(input_size,n_layers,batch_first=True)
        self.fc1 = nn.Linear(hidden_dim,hidden_dim)
        self.bn1 = nn.BatchNorm1d(num_features=hidden_dim)
        self.fc2 = nn.Linear(hidden_dim,hidden_dim)
        self.bn2 = nn.BatchNorm1d(num_features=hidden_dim)
        self.fc3 = nn.Linear(hidden_dim,hidden_dim)
        self.bn3 = nn.BatchNorm1d(num_features=hidden_dim)
        self.fc4 = nn.Linear(hidden_dim,hidden_dim)
        self.bn4 = nn.BatchNorm1d(num_features=hidden_dim)
        self.fc5 = nn.Linear(hidden_dim,hidden_dim)
        self.output = nn.Linear(hidden_dim,output_size)

    def forward(self,x):
        x = x.float()
        x = F.relu(self.gru(x)[1])
        x = x[-1,:,:] # eliminates first dim
        x = F.dropout(x,0.5)
        x = F.relu(self.bn1(self.fc1(x)))
        x = F.dropout(x,0.5)
        x = F.relu(self.bn2(self.fc2(x)))
        x = F.dropout(x,0.5)
        x = F.relu(self.bn3(self.fc3(x)))
        x = F.dropout(x,0.5)
        x = F.relu(self.bn4(self.fc4(x)))
        x = F.dropout(x,0.5)
        x = F.relu(self.fc5(x))
        return torch.sigmoid(self.output(x))

    def init_hidden(self,batch_size):
        hidden = torch.zeros(self.n_layers,batch_size,self.hidden_dim)
        return hidden

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 <select id="xxx"> SELECT di.id, di.name, di.work_type, di.updated... <where> <if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 <property name="dynamic.classpath" value="tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams['font.sans-serif'] = ['SimHei'] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -> systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping("/hires") public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate<String
使用vite构建项目报错 C:\Users\ychen\work>npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-