Python 中的多处理不使用所有内核

如何解决Python 中的多处理不使用所有内核

我正在尝试处理包含许多栅格的多个文件夹;在每个文件夹中,同一区域都有不同日期的栅格。为了节省一些时间,我想使用多处理(或多线程?)模块并行工作。

基本上,我的脚本是这样做的:对于一个像素,它对第一个像素进行一些计算并将其加载到 numpy 数组,如果该数字高于 {{1} } 大批;然后它继续另一个像素。结果应该是几个 numpy 数组(每个文件夹一个)。 无需多处理即可正常工作;当我尝试对其进行多处理时,它变得非常慢并且没有利用所有 10 个内核:

enter image description here

这是我的代码:

numpy

如果运行一个使用 NumPy 的非常简单的代码,它运行得非常好并且使用 100% CPU 和所有 10 个内核:

import os,sys,math,time,datetime
import numpy as np
from numpy import *
from osgeo import gdal,gdal_array,osr
from itertools import islice
from multiprocessing import Pool,Process
import multiprocessing

#prints full size numpy array instead of extract

np.set_printoptions(threshold=sys.maxsize)

#define tresholds for dNBR,NBR and NDVI difference (ratio NDVIafter/NDVIbefore)

dNBRthreshold=0.15
RdNBRthreshold=0.4
NDVIdiffThreshold=0.1


def proc (path):
    #print information to a log file
    log = open(path+"\\myprog.log","a")
    sys.stdout = log

    #create a list of all files in the current directory 
    ListImages=[]
    for file in os.listdir(path):
        if file.endswith(".tif"):
                ListImages.append(os.path.join(path,file))
    #sort the list aphabetically
    ListImages.sort()
    print ("Image list: ",ListImages)

    #create empty numpy array the same size as the first image and with number of bands defined by user
    firstImage=gdal.Open(ListImages[0])
    band0 = firstImage.GetRasterBand(1)
    arrayOfFirstImage = band0.ReadAsArray()
    listEmpty=[]

    #create numpy array with same size as first image but dimension defined by user in "range"
    for x in range(30):
        name="emptyArray_" + str(x)
        #create raster with same size as first image
        name=np.full_like(arrayOfFirstImage,np.nan,dtype=np.double)
        listEmpty.append(name)
    arrayStack=np.stack(listEmpty)
    num_dim,num_rows,num_cols = arrayStack.shape
    listRows = list(range(num_rows))    

    #creates loop over all pixels in raster
    for row in range(num_rows):
        print("row number: ",row)
        for col in range(num_cols):
            #reset counter for band as script is working with a new pixel; cntrForBand is used to change arrayStack bands that will be written on
            cntrForBand=0
            print("col number: ",col)
            #loop for all images in list ListImages to get image 1
            #user ITER to be able to jump 7 o 22 loops
            iterListImages = iter(ListImages)
            for image in iterListImages:
                #get number of image in the List of Images
                indexImage1 = ListImages.index(image)
                #get its full path
                img1Path=os.path.abspath(image)
                print ("path image 1: " + img1Path)
                print ("index Image 1: ",indexImage1)
            
                #open geotiff with gdal
                img = gdal.Open(image)
                #get first band data of image 1: NDVI value
                band1Image1=img.GetRasterBand(1)
                #get second band data of image 1: NBR value
                band2Image1 = img.GetRasterBand(2)
                               
                ## compute statistics of band 1
                if band1Image1.GetMinimum() is None or band1Image1.GetMaximum()is None:
                    band1Image1.ComputeStatistics(0)
                    print("Statistics computed.")
                    
                ## compute statistics of band 2
                if band2Image1.GetMinimum() is None or band2Image1.GetMaximum()is None:
                    band2Image1.ComputeStatistics(0)
                    print("Statistics computed.")
                    
                #converts gdal array (raster or band) into a numpy array:
                band1Image1asArray = band1Image1.ReadAsArray()
                #print ("NDVI array= ",band1Image1asArray)    
                band2Image1asArray = band2Image1.ReadAsArray()
                #Get NDVI value of pixel of interest
                itemNDVIimage1=band1Image1asArray[row][col]
                print("itemNDVIimage1: ",itemNDVIimage1)
                #Get NBR value of pixel of interest
                itemImage1=band2Image1asArray[row][col]
                print("itemImage1: ",itemImage1)
                #if pixel has no value,don´t do anything
                if itemImage1== band2Image1.GetNoDataValue() or itemImage1==-32768:
                    print("row number: ",row)
                    print("col number: ",col)
                    print ("image 1 pixel with no data value; initiating with another image")

                #if pixel has a value,proceed
                else:
                    #reset switch to False (switch is used to skip images
                    switch1=False
                    #list of numbers for image 2: from index of image + 1 to index of image 1 + 8
                    listImg2=[indexImage1+1,indexImage1+2,indexImage1+3,indexImage1+4,indexImage1+5,indexImage1+6,indexImage1+7,indexImage1+8]
                    for indexImg2 in listImg2:
                        print("length list image: ",len(ListImages))
                        print ("Current indexImg2: ",indexImg2)
                        print("row number: ",row)
                        print("col number: ",col)
                        #if number of image 2 is above number of images in list,stop (all images have been processed)
                        if indexImg2>=len(ListImages):
                            break
                        #if not,proceed
                        else:
                            
                            #open next image in the list (next date)
                            image2=gdal.Open(ListImages[indexImg2])
                            img2Path=os.path.abspath(ListImages[indexImg2])
                            print ("path image 2: " + img2Path)
                            #get image 2 NDVI value for this pixel
                            band1Image2 = image2.GetRasterBand(1)
                            band1Image2AsArray = band1Image2.ReadAsArray()
                            itemNDVIimage2=band1Image2AsArray[row][col]
                            print("item image 2,Band 1 (NDVI): ",itemNDVIimage2)
                            #get image 2 NBR value for this pixel
                            band2Image2 = image2.GetRasterBand(2)
                            band2Image2AsArray = band2Image2.ReadAsArray()
                            #print ("Image 2,Band 2:",band2Image2AsArray)
                            itemImage2=band2Image2AsArray[row][col]
                            print("item image 2: ",itemImage2)
                            #if image 2 has no value for NBR band,stop and continue with next image 2 
                            if itemImage2== band2Image2.GetNoDataValue() or itemImage2==-32768:
                                print ("image 2 pixel with no data value; initiating with another image")
                            else:
                                #calculate dNBR,NBR and NDVI difference between the two images
                                dNBR=itemImage1-itemImage2
                                RdNBR=dNBR/(math.sqrt(abs(itemImage1)))
                                NDVIdiff=1-itemNDVIimage2/itemNDVIimage1
                                print ("dNBR: ",dNBR)
                                print ("RdNBR: ",RdNBR)
                                print ("NDVI difference: ",NDVIdiff)
                                #if dNBR equals exactly 0,it means that image 1 and image 2 were the same; stop and continue with next image
                                if dNBR==0:
                                    print("same image for image 1 and image2; initiating with another image for image 2")
                                #if dNBR,NBR or NDVI difference values are under thresholds,stop and continue with next image
                                elif dNBR<dNBRthreshold or RdNBR<RdNBRthreshold or NDVIdiff<NDVIdiffThreshold :
                                    print("dNBR or RdNBR or NDVIdiff under threshold; continue with next image for image 2")

                                else:  
                                    #open empty image and set new dNBR and RdNBR and date values in first,second and third band respectively. in ArrayStack,first number is number of band (first is zero) then row then column.
                                    #if dNBR  or RdNBR values is above value already saved in the array or if current value is empty (nan),overwrite it; else,don't overwrite it
                                    print ("current dNBR value for this cell in arrayStack: ",arrayStack[cntrForBand][row][col])
                                    if (dNBR>arrayStack[cntrForBand][row][col] and RdNBR>arrayStack[cntrForBand+1][row][col]) or (math.isnan(arrayStack[cntrForBand][row][col])):
                                        #keep dNBR,RdNBR and date value in first,second and third of the three bands (hence cntrForBand for dNBR,cntrForBand+1 for RdNBR and cntrForBand+2 for Date)
                                        arrayStack[cntrForBand][row][col]= dNBR
                                        arrayStack[cntrForBand+1][row][col]= RdNBR
                                        #arrayStack[0,0]=dNBR
                                            #date value put in second band
                                        date=int(img2Path[-15:-8])
                                        arrayStack[cntrForBand+2][row][col]= date
                                        print ("arrayStack updated: ",arrayStack)
                                        #turn switch on to skip 22 images (forest and therefore fire won't come back soon...)
                                        switch1= True
                                    else:
                                        #print(arrayStack)
                                        print ("dNBR value lower than value already in arrayStack; not changing value")
                    #if one value of dNBR and RdNBR is above threshold during loops with image 1 and 2,then skip 6 monts and continue with image 1 + 22
                    #else,continue with image 1 + 7
                    if switch1==True:
                        next(islice(iterListImages,44,44),None)  # consume 22
                        print("a value has been found for this set of 8 images; continuing with image 1 + 44")
                        #cntr for band increments with 3 so that next round three other bands of arrayStack get the dNBR,NBR and Date values
                        cntrForBand=cntrForBand+3
                        print ("cntrForBand=",cntrForBand)
                    else:
                        #if no high value found,go to image+7 in list
                        next(islice(iterListImages,7,7),None)
                        print("No value found for this set of 8 images; continuing with next image (+1)")
                        
    print ("done!!!!")
    print (arrayStack)
    np.save(path+"\\FINAL.csv",arrayStack)
    print("file FINAL.csv saved")
     
    if __name__ == '__main__':
        listFolders= [ f.path for f in os.scandir("C:\\incendios\\Temp3") if f.is_dir() ]
        print (listFolders,type(listFolders))
        cpuCount = os.cpu_count() 
        print ("number of core: ",cpuCount)
        p = Pool(10)
        print(p.map(proc,listFolders))

我知道 NumPy 会导致一些 issues with multiprocessing,但这似乎不是我在这里遇到的问题。 所以我猜我的代码有问题,导致难以处理多核。有什么我可以做的来改善它吗? PS:我使用的是 Windows 10 64 位和 python 3.5.0,脚本在没有多处理的情况下工作正常......

编辑: 回答 Mark Stechell 的问题:实际上我有 10 个文件夹;每个文件夹有大约 900 个栅格,覆盖每个文件夹的一个区域,从 2000 年到 2020 年每 8 天一个栅格。这些栅格是我已经处理过的卫星图像;第一个波段是植被指数(称为 NDVI),第二个波段是燃烧面积指数(NBR,用于识别森林火灾的基本指数);在这个脚本中,我使用这些数据来计算其他指数(dNBR 和 RdNBR;最后一个是相对指数,这意味着我比较两个不同日期的 NBR 指数以检测显着变化)。如果这些索引足够高(在脚本开头定义了阈值),这意味着检测到林业火灾,我会将 NDVI 和 RdNBR 值保存在带有日期的 numpy 数组中。但我只与以下 8 个日期进行比较;如果没有发现重要的价值,脚本会继续处理列表中的另一个图像及其 7 个后续图像(按时间顺序);如果发现了重要的值,脚本会在列表中跳转 22 个图像,因为在很长一段时间内,该区域不会再次发生森林火灾..

按照 mkrieger1 的建议,我试图尽可能地简化它,看看问题出在哪里。我还将尝试在我提到的非常简单的代码中使用 Pool 以查看是否有效

解决方法

因此,按照 mkrieger1 的建议(非常感谢,现在我知道了...),我尝试逐行运行我的脚本以查看问题出在哪里。它显然与 GDAL 库有关。 getNoDataValue()、getMinimum() 和 getMaximum() 函数在这里是多处理的问题。我已经使用与其他库相关的函数更改了代码(例如,如果 itemImage1==getNoDataValue () 已更改为 if math.isnan(x))。 现在它完美地工作...... 我希望它能帮助其他有同样问题的人。 非常感谢!

版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。

相关推荐


依赖报错 idea导入项目后依赖报错,解决方案:https://blog.csdn.net/weixin_42420249/article/details/81191861 依赖版本报错:更换其他版本 无法下载依赖可参考:https://blog.csdn.net/weixin_42628809/a
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下 2021-12-03 13:33:33.927 ERROR 7228 [ main] o.s.b.d.LoggingFailureAnalysisReporter : *************************** APPL
错误1:gradle项目控制台输出为乱码 # 解决方案:https://blog.csdn.net/weixin_43501566/article/details/112482302 # 在gradle-wrapper.properties 添加以下内容 org.gradle.jvmargs=-Df
错误还原:在查询的过程中,传入的workType为0时,该条件不起作用 &lt;select id=&quot;xxx&quot;&gt; SELECT di.id, di.name, di.work_type, di.updated... &lt;where&gt; &lt;if test=&qu
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct redisServer’没有名为‘server_cpulist’的成员 redisSetCpuAffinity(server.server_cpulist); ^ server.c: 在函数‘hasActiveC
解决方案1 1、改项目中.idea/workspace.xml配置文件,增加dynamic.classpath参数 2、搜索PropertiesComponent,添加如下 &lt;property name=&quot;dynamic.classpath&quot; value=&quot;tru
删除根组件app.vue中的默认代码后报错:Module Error (from ./node_modules/eslint-loader/index.js): 解决方案:关闭ESlint代码检测,在项目根目录创建vue.config.js,在文件中添加 module.exports = { lin
查看spark默认的python版本 [root@master day27]# pyspark /home/software/spark-2.3.4-bin-hadoop2.7/conf/spark-env.sh: line 2: /usr/local/hadoop/bin/hadoop: No s
使用本地python环境可以成功执行 import pandas as pd import matplotlib.pyplot as plt # 设置字体 plt.rcParams[&#39;font.sans-serif&#39;] = [&#39;SimHei&#39;] # 能正确显示负号 p
错误1:Request method ‘DELETE‘ not supported 错误还原:controller层有一个接口,访问该接口时报错:Request method ‘DELETE‘ not supported 错误原因:没有接收到前端传入的参数,修改为如下 参考 错误2:cannot r
错误1:启动docker镜像时报错:Error response from daemon: driver failed programming external connectivity on endpoint quirky_allen 解决方法:重启docker -&gt; systemctl r
错误1:private field ‘xxx‘ is never assigned 按Altʾnter快捷键,选择第2项 参考:https://blog.csdn.net/shi_hong_fei_hei/article/details/88814070 错误2:启动时报错,不能找到主启动类 #
报错如下,通过源不能下载,最后警告pip需升级版本 Requirement already satisfied: pip in c:\users\ychen\appdata\local\programs\python\python310\lib\site-packages (22.0.4) Coll
错误1:maven打包报错 错误还原:使用maven打包项目时报错如下 [ERROR] Failed to execute goal org.apache.maven.plugins:maven-resources-plugin:3.2.0:resources (default-resources)
错误1:服务调用时报错 服务消费者模块assess通过openFeign调用服务提供者模块hires 如下为服务提供者模块hires的控制层接口 @RestController @RequestMapping(&quot;/hires&quot;) public class FeignControl
错误1:运行项目后报如下错误 解决方案 报错2:Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.8.1:compile (default-compile) on project sb 解决方案:在pom.
参考 错误原因 过滤器或拦截器在生效时,redisTemplate还没有注入 解决方案:在注入容器时就生效 @Component //项目运行时就注入Spring容器 public class RedisBean { @Resource private RedisTemplate&lt;String
使用vite构建项目报错 C:\Users\ychen\work&gt;npm init @vitejs/app @vitejs/create-app is deprecated, use npm init vite instead C:\Users\ychen\AppData\Local\npm-