TesseractError：'打开数据文件\\ Program Filesx86\\ Tesseract-OCR \\ eng.traineddata时出错

如何解决TesseractError：'打开数据文件\\ Program Filesx86\\ Tesseract-OCR \\ eng.traineddata时出错

我正在尝试在Jupyter Notebook上使用pytesseract。

Windows 10 x64 以管理权限运行Jupyter Notebook（Anaconda3，Python 3.8.3）

包含TIFF文件的工作目录位于其他驱动器（Z :)中当我运行以下代码时：

pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
img = cv2.imread('1.png')
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)

hImg,wImg,_ = img.shape
conf = r'--oem 3 --psm 6 outputbase digits'
boxes = pytesseract.image_to_data(img,config=conf)
for a,b in enumerate(boxes.splitlines()):
        print(b)
        if a!=0:
            b = b.split()
            if len(b)==12:
                x,y,w,h = int(b[6]),int(b[7]),int(b[8]),int(b[9])
                cv2.putText(img,b[11],(x,y-5),cv2.FONT_HERSHEY_SIMPLEX,1,(50,50,255),2)
                cv2.rectangle(img,y),(x+w,y+h),2)

cv2.imshow('Img',img)
cv2.waitKey(0)

我收到以下错误：

TesseractError                            Traceback (most recent call last)
<ipython-input-46-ec2bc4b38a7a> in <module>
      1 hImg,_ = img.shape
      2 conf = r'--oem 3 --psm 6 outputbase digits'
----> 3 boxes = pytesseract.image_to_data(img,config=conf)
      4 # boxes = pytesseract.image_to_data(img,config=tessdata_dir_config)
      5 for a,b in enumerate(boxes.splitlines()):

~\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py in image_to_data(image,lang,config,nice,output_type,timeout,pandas_config)
    460     args = [image,'tsv',timeout]
    461 
--> 462     return {
    463         Output.BYTES: lambda: run_and_get_output(*(args + [True])),464         Output.DATAFRAME: lambda: get_pandas_output(

~\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py in <lambda>()
    466         ),467         Output.DICT: lambda: file_to_dict(run_and_get_output(*args),'\t',-1),--> 468         Output.STRING: lambda: run_and_get_output(*args),469     }[output_type]()
    470 

~\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py in run_and_get_output(image,extension,return_bytes)
    280         
    281 
--> 282         run_tesseract(**kwargs)
    283         filename = kwargs['output_filename_base'] + extsep + extension
    284         with open(filename,'rb') as output_file:

~\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename,output_filename_base,timeout)
    256     with timeout_manager(proc,timeout) as error_string:
    257         if proc.returncode:
--> 258             raise TesseractError(proc.returncode,get_errors(error_string))
    259 
    260 

TesseractError: (1,'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')

我想通过环境变量而不是通过在pytesseract.image_to_data（）中设置配置变量之类的代码来解决此问题。

TesseractError：'打开数据文件\\ Program Filesx86\\ Tesseract-OCR \\ eng.traineddata时出错

如何解决TesseractError：'打开数据文件\\ Program Filesx86\\ Tesseract-OCR \\ eng.traineddata时出错

相关推荐