如何解决使用双向图层会导致错误:CancelledError:[_Derived_] RecvAsync被取消
我遇到一个问题,每当我在模型中包括双向图层包装器时,它就会在训练期间导致崩溃,并出现以下错误:
CancelledError Traceback (most recent call last)
<ipython-input-7-7944b517869f> in <module>
1 history = model.fit(train_dataset,epochs=10,2 validation_data=test_dataset,----> 3 validation_steps=30)
D:\Python\anaconda\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py in _method_wrapper(self,*args,**kwargs)
106 def _method_wrapper(self,**kwargs):
107 if not self._in_multi_worker_mode(): # pylint: disable=protected-access
--> 108 return method(self,**kwargs)
109
110 # Running inside `run_distribute_coordinator` already.
D:\Python\anaconda\envs\tf-gpu\lib\site-packages\tensorflow\python\keras\engine\training.py in fit(self,x,y,batch_size,epochs,verbose,callbacks,validation_split,validation_data,shuffle,class_weight,sample_weight,initial_epoch,steps_per_epoch,validation_steps,validation_batch_size,validation_freq,max_queue_size,workers,use_multiprocessing)
1096 batch_size=batch_size):
1097 callbacks.on_train_batch_begin(step)
-> 1098 tmp_logs = train_function(iterator)
1099 if data_handler.should_sync:
1100 context.async_wait()
D:\Python\anaconda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in __call__(self,**kwds)
778 else:
779 compiler = "nonXla"
--> 780 result = self._call(*args,**kwds)
781
782 new_tracing_count = self._get_tracing_count()
D:\Python\anaconda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\def_function.py in _call(self,**kwds)
805 # In this case we have created variables on the first call,so we run the
806 # defunned version which is guaranteed to never create variables.
--> 807 return self._stateless_fn(*args,**kwds) # pylint: disable=not-callable
808 elif self._stateful_fn is not None:
809 # Release the lock early so that multiple threads can perform the call
D:\Python\anaconda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in __call__(self,**kwargs)
2827 with self._lock:
2828 graph_function,args,kwargs = self._maybe_define_function(args,kwargs)
-> 2829 return graph_function._filtered_call(args,kwargs) # pylint: disable=protected-access
2830
2831 @property
D:\Python\anaconda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in _filtered_call(self,kwargs,cancellation_manager)
1846 resource_variable_ops.BaseResourceVariable))],1847 captured_inputs=self.captured_inputs,-> 1848 cancellation_manager=cancellation_manager)
1849
1850 def _call_flat(self,captured_inputs,cancellation_manager=None):
D:\Python\anaconda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in _call_flat(self,cancellation_manager)
1922 # No tape is watching; skip to running the function.
1923 return self._build_call_outputs(self._inference_function.call(
-> 1924 ctx,cancellation_manager=cancellation_manager))
1925 forward_backward = self._select_forward_and_backward_functions(
1926 args,D:\Python\anaconda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\function.py in call(self,ctx,cancellation_manager)
548 inputs=args,549 attrs=attrs,--> 550 ctx=ctx)
551 else:
552 outputs = execute.execute_with_cancellation(
D:\Python\anaconda\envs\tf-gpu\lib\site-packages\tensorflow\python\eager\execute.py in quick_execute(op_name,num_outputs,inputs,attrs,name)
58 ctx.ensure_initialized()
59 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle,device_name,op_name,---> 60 inputs,num_outputs)
61 except core._NotOkStatusException as e:
62 if name is not None:
CancelledError: [_Derived_]RecvAsync is cancelled.
[[{{node gradient_tape/sequential/embedding/embedding_lookup/Reshape/_38}}]] [Op:__inference_train_function_5988]
Function call stack:
train_function
我正在运行Tensorflow教程中的确切代码:https://www.tensorflow.org/tutorials/text/text_classification_rnn#prepare_the_data_for_training。
此外,我尝试过将这些行包括在内 ''' physical_devices = tf.config.list_physical_devices('GPU') tf.config.experimental.set_memory_growth(physical_devices [0],是) ''' 在我的程序开始时,我遇到了同样的问题。
我的Tensorflow版本是2.3.0,Cuda版本是10.1.243,CUDNN版本是7.6.5。
有人知道这个问题的可能解决方案吗?
解决方法
使用Google colab,上述tutorial对我来说很好用。
您的Tensorflow版本与Cuda和CUDNN版本兼容,这不成问题。
问题可能是内存使用错误,应解决此问题。
r"""
#Fra command prompt:
#C:\Users\David>C:\Users\David\Desktop\IN1900\uke38\quadratic_roots_cml.py 1 0 -1
#The quadratic formula with used values gives two roots 1.0 and -1.0
#(koden funker ikke på et eller annet magisk vis når jeg bruker
"""
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。