如何解决具有类和函数以返回数据帧的并行化多处理Python
我完全不熟悉python中的并行化和多重处理。我已经尝试了几天以并行/多处理我的代码,但是失败了。下面的代码读取一个文本文件。那是一个几乎没有不同定界符的数据块。该代码获取数据的blob并返回一个不错的数据帧。现在,我的代码大约需要8分钟才能在示例数据上运行。我正在努力加快速度。该代码超过150行,但基本代码段如下:
import os
import re
import pandas as pd
import multiprocessing as mp
import datetime
_begin_time = datetime.datetime.now()
pool = mp.Pool(mp.cpu_count())
class convertdata:
def __init__(self,directorypath):
:
def getdatatype1(self):
For loop for row and column of df:
few unavoidable if conditions:
return resultdf
dir = 'path address'
if __name__ == '__main__':
_t1 = convertdata(dir)
_r1 = pool.apply(_t1.getdatatype1())
pool.close()
print(datetime.datetime.now() - _begin_time)
我收到以下错误:
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。