如何解决TesseractError:'打开数据文件\\ Program Filesx86\\ Tesseract-OCR \\ eng.traineddata时出错
我正在尝试在Jupyter Notebook上使用pytesseract。
Windows 10 x64 以管理权限运行Jupyter Notebook(Anaconda3,Python 3.8.3)
包含TIFF文件的工作目录位于其他驱动器(Z :)中 当我运行以下代码时:
pytesseract.pytesseract.tesseract_cmd = 'C:/Program Files (x86)/Tesseract-OCR/tesseract.exe'
img = cv2.imread('1.png')
img = cv2.cvtColor(img,cv2.COLOR_BGR2RGB)
hImg,wImg,_ = img.shape
conf = r'--oem 3 --psm 6 outputbase digits'
boxes = pytesseract.image_to_data(img,config=conf)
for a,b in enumerate(boxes.splitlines()):
print(b)
if a!=0:
b = b.split()
if len(b)==12:
x,y,w,h = int(b[6]),int(b[7]),int(b[8]),int(b[9])
cv2.putText(img,b[11],(x,y-5),cv2.FONT_HERSHEY_SIMPLEX,1,(50,50,255),2)
cv2.rectangle(img,y),(x+w,y+h),2)
cv2.imshow('Img',img)
cv2.waitKey(0)
我收到以下错误:
TesseractError Traceback (most recent call last)
<ipython-input-46-ec2bc4b38a7a> in <module>
1 hImg,_ = img.shape
2 conf = r'--oem 3 --psm 6 outputbase digits'
----> 3 boxes = pytesseract.image_to_data(img,config=conf)
4 # boxes = pytesseract.image_to_data(img,config=tessdata_dir_config)
5 for a,b in enumerate(boxes.splitlines()):
~\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py in image_to_data(image,lang,config,nice,output_type,timeout,pandas_config)
460 args = [image,'tsv',timeout]
461
--> 462 return {
463 Output.BYTES: lambda: run_and_get_output(*(args + [True])),464 Output.DATAFRAME: lambda: get_pandas_output(
~\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py in <lambda>()
466 ),467 Output.DICT: lambda: file_to_dict(run_and_get_output(*args),'\t',-1),--> 468 Output.STRING: lambda: run_and_get_output(*args),469 }[output_type]()
470
~\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py in run_and_get_output(image,extension,return_bytes)
280
281
--> 282 run_tesseract(**kwargs)
283 filename = kwargs['output_filename_base'] + extsep + extension
284 with open(filename,'rb') as output_file:
~\AppData\Local\Programs\Python\Python38\lib\site-packages\pytesseract\pytesseract.py in run_tesseract(input_filename,output_filename_base,timeout)
256 with timeout_manager(proc,timeout) as error_string:
257 if proc.returncode:
--> 258 raise TesseractError(proc.returncode,get_errors(error_string))
259
260
TesseractError: (1,'Error opening data file \\Program Files (x86)\\Tesseract-OCR\\eng.traineddata Please make sure the TESSDATA_PREFIX environment variable is set to your "tessdata" directory. Failed loading language \'eng\' Tesseract couldn\'t load any languages! Could not initialize tesseract.')
我想通过环境变量而不是通过在pytesseract.image_to_data()中设置配置变量之类的代码来解决此问题。
版权声明:本文内容由互联网用户自发贡献,该文观点与技术仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 dio@foxmail.com 举报,一经查实,本站将立刻删除。