如何在没有张量流泄漏的情况下编写流函数

如何解决如何在没有张量流泄漏的情况下编写流函数

我正尝试参加kaggle的Cornell鸟叫检测挑战赛，总共有23 gb的数据，主要由mp3声音文件组成。如您所知，无法将23 gb的数据放入RAM kaggle或Google colab。因此，我尝试编写一个数据生成器以在训练模型时获取mp3文件并对其进行转换，以防止出现内存不足的问题。但是，在前几个时期之后，我仍然没有内存不足的问题。在下面，您可以找到我的生成器和训练代码，在其中使用del命令专门从内存中取消分配对象，但是显然我做错了。您是否可以为此提供任何建议，或者是否有任何建议可以改进我的代码以防止内存泄漏？调用垃圾收集器也没有区别。

Thx

我的数据生成器代码

from tensorflow import keras
import random
import glob
import gc

class My_Custom_Generator(keras.utils.Sequence):
    def __init__(self,batch_size):
        files = glob.glob("../input/birdsong-recognition/train_audio/*/*.mp3")
        random.shuffle(files)
        self.files = files
        self.batch_size = batch_size
    

    def __len__(self) :
        return (np.ceil(len(self.files) / float(self.batch_size))).astype(np.int)
  
  
    def __getitem__(self,idx) :
        gc.collect(2)
        
        batch_x = self.files[idx * self.batch_size : (idx+1) * self.batch_size]
        #batch_y = self.labels[idx * self.batch_size : (idx+1) * self.batch_size]

        train_image = []
        train_label = []

        for i in range(0,len(batch_x)):
            image,label = get_data(batch_x[i])
            image = tf.convert_to_tensor(image)
            label_matrix = get_cat_label(label)
            
            train_image.append(image)
            train_label.append(label_matrix)
            
        self.train_image = np.array(train_image)
        self.train_label = np.array(train_label)
        del train_image
        del train_label
        return self.train_image,self.train_label

我从tensorflow教程获得并编辑的训练循环

 ## Note: Rerunning this cell uses the same model variables

# Keep results for plotting
train_loss_results = []
train_accuracy_results = []

num_epochs = int(len(glob.glob("../input/birdsong-recognition/train_audio/*/*.mp3")) // 8)

for epoch in range(num_epochs):
    epoch_loss_avg = tf.keras.metrics.Mean()
    epoch_accuracy = tf.keras.metrics.CategoricalAccuracy()
    
    imgs,labels = my_training_batch_generator.__getitem__(epoch)
    
    # Training loop - using batches of 32
    for i in range(1):
        # Optimize the model
        loss_value,grads = grad(xceptionModel,imgs,labels)
        optimizer.apply_gradients(zip(grads,xceptionModel.trainable_variables))

        # Track progress
        epoch_loss_avg.update_state(loss_value)  # Add current batch loss
        # Compare predicted label to actual label
        # training=True is needed only if there are layers with different
        # behavior during training versus inference (e.g. Dropout).
        epoch_accuracy.update_state(labels,xceptionModel(imgs,training=True))

    del imgs
    del labels
      # End epoch
    train_loss_results.append(epoch_loss_avg.result())
    train_accuracy_results.append(epoch_accuracy.result())

    if epoch % 2 == 0:
        print("Epoch {:03d}: Loss: {:.3f},Accuracy: {:.3%}".format(epoch,epoch_loss_avg.result(),epoch_accuracy.result()))

如何在没有张量流泄漏的情况下编写流函数

如何解决如何在没有张量流泄漏的情况下编写流函数

相关推荐