有没有办法消除Calamari-ocr文本检测引擎中不必要的时间延迟？

如何解决有没有办法消除Calamari-ocr文本检测引擎中不必要的时间延迟？

我一直在使用Calamari-ocr文本检测引擎。 Calamari ocr repo 我注意到的是，当一个人进行预测时，它会发出很多消息，而当它实际做出预测时，告诉您说要花1秒钟才能做出预测，对我来说已经花了5秒钟真正达到那个预测。即，如果我给“预测命令”计时，尽管它告诉我模型进行预测只花了1秒钟，但大约需要5秒钟。

我的调查导致我在进行预测的回购中使用了this脚本。然后，我进一步指出了MultiPredictor类对此负责（第92行）。因此，我查看了该类的定义位置，发现它是here定义的。但是我找不到引起消息和警告喷涌的原因，这些消息和警告对我造成了延迟。

这是我用来进行预测和计时的代码片段：

import subprocess
import time

cmd = ['calamari-predict','--checkpoint','/opt/working/base_model/model_Calamari_1.ckpt.json','--files','/opt/working/data/*.png']
start_time = time.time()
subprocess.run(cmd)
print(time.time()-start_time)

请注意，我已经安装了calamari软件包以及根据仓库的说明所必需的依赖项。

这是我运行预测时提供的响应：

Found 1 files in the dataset
Checkpoint version 2 is up-to-date.
2020-09-02 14:50:49.243030: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-09-02 14:50:49.243090: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-09-02 14:50:50.390703: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-09-02 14:50:50.390770: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-09-02 14:50:50.390814: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (6d0f92664ca6): /proc/driver/nvidia/version does not exist
2020-09-02 14:50:50.391069: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations,rebuild TensorFlow with the appropriate compiler flags.
2020-09-02 14:50:50.396503: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 1896005000 Hz
2020-09-02 14:50:50.396883: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55ca0736dbb0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-02 14:50:50.396937: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host,Default Version
Model: "functional_1"
__________________________________________________________________________________________________
Layer (type)                    Output Shape         Param #     Connected to                     
==================================================================================================
input_data (InputLayer)         [(None,None,48,1) 0                                            
__________________________________________________________________________________________________
conv2d_0 (Conv2D)               (None,40) 400         input_data[0][0]                 
__________________________________________________________________________________________________
pool2d_1 (MaxPooling2D)         (None,24,40) 0           conv2d_0[0][0]                   
__________________________________________________________________________________________________
conv2d_1 (Conv2D)               (None,60) 21660       pool2d_1[0][0]                   
__________________________________________________________________________________________________
pool2d_3 (MaxPooling2D)         (None,12,60) 0           conv2d_1[0][0]                   
__________________________________________________________________________________________________
reshape (Reshape)               (None,720)    0           pool2d_3[0][0]                   
__________________________________________________________________________________________________
bidirectional (Bidirectional)   (None,400)    1473600     reshape[0][0]                    
__________________________________________________________________________________________________
input_sequence_length (InputLay [(None,1)]          0                                            
__________________________________________________________________________________________________
dropout (Dropout)               (None,400)    0           bidirectional[0][0]              
__________________________________________________________________________________________________
tf_op_layer_FloorDiv (TensorFlo [(None,1)]          0           input_sequence_length[0][0]      
__________________________________________________________________________________________________
logits (Dense)                  (None,29)     11629       dropout[0][0]                    
__________________________________________________________________________________________________
tf_op_layer_FloorDiv_1 (TensorF [(None,1)]          0           tf_op_layer_FloorDiv[0][0]       
__________________________________________________________________________________________________
softmax (Softmax)               (None,29)     0           logits[0][0]                     
__________________________________________________________________________________________________
input_data_params (InputLayer)  [(None,1)]          0                                            
__________________________________________________________________________________________________
tf_op_layer_Cast (TensorFlowOpL [(None,1)]          0           tf_op_layer_FloorDiv_1[0][0]     
==================================================================================================
Total params: 1,507,289
Trainable params: 1,289
Non-trainable params: 0
__________________________________________________________________________________________________
None
Prediction:   0%|                                                                                       | 0/1 [00:00<?,?it/s]
barcode_0: '‪CLK0SU0M((141707‬'
Prediction: 100%|███████████████████████████████████████████████████████████████████████████████| 1/1 [00:01<00:00,1.52s/it]
Prediction of 1 models took 1.5640497207641602s
Average sentence confidence: 85.41%
All files written
4.881764888763428

您可以看到该模型表明进行预测仅花费了1.5640497207641602秒，但是在底部您可以看到实际上花费了4.881764888763428秒。

为确保将预测函数作为子例程运行不是造成时间延迟的罪魁祸首，我重构了进行预测的代码并运行它：

from bidi.algorithm import get_base_level

from google.protobuf.json_format import MessageToJson

from calamari_ocr import __version__
from calamari_ocr.utils.glob import glob_all
from calamari_ocr.ocr.datasets import DataSetType,create_dataset,DataSetMode
from calamari_ocr.ocr import MultiPredictor
from calamari_ocr.ocr.voting import voter_from_proto
from calamari_ocr.proto import VoterParams,Predictions,CTCDecoderParams

import os
import cv2
import time
import logging

logger = logging.getLogger()


class PredictionAttrs:
    def __init__(self):
        """Atributes:
            List all image files that shall be processed
            --files

            Optional list of additional text files. E.g. when updating Abbyy prediction,this parameter must be used for the xml files.
            --text_files

            Used to identify the DatasetType
            --dataset

            Define the prediction extension. This parameter can be used to override ground truth files.
            --extension

            Path to the checkpoint without file extension
            --checkpoint

            Number of processes to use"
            -j --processes

            The batch size during the prediction (number of lines to process in parallel)
            --batch_size

            Print additional information
            --verbose

            The voting algorithm to use. Possible values: confidence_voter_default_ctc (default),confidence_voter_fuzzy_ctc,sequence_voter
            --voter

            By default the prediction files will be written to the same directory as the given files. You can use this argument to specify a specific output dir for the prediction files.
            --output_dir

            Write: Predicted string,labels; position,probabilities and alternatives of chars to a .pred (protobuf) file
            --extended_prediction_data

            Extension format: Either pred or json. Note that json will not print logits.
            --extended_prediction_data_format

            Do not show any progress bars
            --no_progress_bars

            List of text files that will be used to create a dictionary
            --dictionary

            Number of beams when using the CTCWordBeamSearch decoder
            --beam_width

            # dataset extra args
            --dataset_pad
            --pagexml_text_index
        """
        self.files = []
        self.checkpoint = []
        self.processes = 1
        self.batch_size = 1
        self.verbose = True
        self.voter = "confidence_voter_default_ctc"
        self.output_dir = None
        self.extended_prediction_data = False
        self.extended_prediction_data_format = "json"
        self.no_progress_bars = True
        self.extension = None
        self.dataset = DataSetType.FILE
        self.text_files = None
        self.pagexml_text_index = 0
        self.beam_width = 25
        self.dictionary = []
        self.dataset_pad = None


class CalamariPredict(object):
    def __init__(self,checkpoint_path,file):
        """This class utilises the attributes of PredictionAttrs class and
         uses the specified checkpoint to predict the text for the input file.
         For predicting the method run() is called.

        Arguments:
            checkpoint_path: Path to checkpoint files
        """
        self.args = PredictionAttrs()
        self.args.checkpoint = checkpoint_path
        self.args.files = file

        # add json as extension,resolve wildcard,expand user,... and remove .json again
        self.args.checkpoint = [(cp if cp.endswith(".json") else cp + ".json") for cp in self.args.checkpoint]
        self.args.checkpoint = glob_all(self.args.checkpoint)
        self.args.checkpoint = [cp[:-5] for cp in self.args.checkpoint]

        self.args.extension = (
            self.args.extension if self.args.extension else DataSetType.pred_extension(self.args.dataset)
        )

        # create ctc decoder
        ctc_decoder_params = self.create_ctc_decoder_params(self.args)

        # create voter
        voter_params = VoterParams()
        voter_params.type = VoterParams.Type.Value(self.args.voter.upper())
        self.voter = voter_from_proto(voter_params)

        # predict for all models
        self.predictor = MultiPredictor(
            checkpoints=self.args.checkpoint,batch_size=self.args.batch_size,processes=self.args.processes,ctc_decoder_params=ctc_decoder_params,)
        # checks
        if self.args.extended_prediction_data_format not in ["pred","json"]:
            raise Exception("Only 'pred' and 'json' are allowed extended prediction data formats")
        # load files
        if self.args.dataset == DataSetType.FILE:
            input_image_files = glob_all(self.args.files)
            if self.args.text_files:
                self.args.text_files = glob_all(self.args.text_files)

        elif self.args.dataset == DataSetType.RAW:
            input_image_files = file

        # skip invalid files and remove them,there wont be predictions of invalid files
        self.dataset = create_dataset(
            self.args.dataset,DataSetMode.PREDICT,input_image_files,self.args.text_files,skip_invalid=True,remove_invalid=True,args={"text_index": self.args.pagexml_text_index,"pad": self.args.dataset_pad,},)

        if len(self.dataset) == 0:
            raise Exception(
                "Empty dataset provided. Check your files argument (got {})!".format(self.args.files)
            )

        do_prediction = self.predictor.predict_dataset(
            self.dataset,progress_bar=not self.args.no_progress_bars
        )

        avg_sentence_confidence = 0
        n_predictions = 0

        self.dataset.prepare_store()

        # output the voted results to the appropriate files
        for result,sample in do_prediction:
            n_predictions += 1
            for i,p in enumerate(result):
                p.prediction.id = "fold_{}".format(i)

            # vote the results (if only one model is given,this will just return the sentences)
            prediction = self.voter.vote_prediction_result(result)
            prediction.id = "voted"
            sentence = prediction.sentence
            avg_sentence_confidence += prediction.avg_char_probability

        print(sentence)
    def create_ctc_decoder_params(self,args):
        """Method for returning the ctc decoder params

        Arguments:
            args: args from an instance of class PredictionAttrs

        Returns:
            params -- ctc decoder params.
        """
        params = CTCDecoderParams()
        params.beam_width = args.beam_width
        params.word_separator = " "

        if args.dictionary and len(args.dictionary) > 0:
            dictionary = set()
            logger.info(f"Creating dictionary")
            for path in glob_all(args.dictionary):
                with open(path,"r") as f:
                    dictionary = dictionary.union({word for word in f.read().split()})

            params.dictionary[:] = dictionary
            logger.info("Dictionary with {} unique words successfully created.".format(len(dictionary)))
        else:
            args.dictionary = None

        if args.dictionary:
            logger.info(
                f"USING A LANGUAGE MODEL IS CURRENTLY EXPERIMENTAL ONLY. NOTE: THE PREDICTION IS VERY SLOW!"
            )
            params.type = CTCDecoderParams.CTC_WORD_BEAM_SEARCH

        return params


if __name__ == "__main__":
    start_time = time.time()
    pred = CalamariPredict(['/opt/working/base_model/model_Calamari_1.ckpt.json'],'/opt/working/data/*.png')
    print(time.time()-start_time)

但这仍然是我得到的：

Checkpoint version 2 is up-to-date.
2020-09-02 15:23:00.328806: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcudart.so.10.1'; dlerror: libcudart.so.10.1: cannot open shared object file: No such file or directory
2020-09-02 15:23:00.328876: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
2020-09-02 15:23:01.536020: W tensorflow/stream_executor/platform/default/dso_loader.cc:59] Could not load dynamic library 'libcuda.so.1'; dlerror: libcuda.so.1: cannot open shared object file: No such file or directory
2020-09-02 15:23:01.536096: W tensorflow/stream_executor/cuda/cuda_driver.cc:312] failed call to cuInit: UNKNOWN ERROR (303)
2020-09-02 15:23:01.536122: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (6d0f92664ca6): /proc/driver/nvidia/version does not exist
2020-09-02 15:23:01.536450: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations:  AVX2 FMA
To enable them in other operations,rebuild TensorFlow with the appropriate compiler flags.
2020-09-02 15:23:01.542147: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 1896005000 Hz
2020-09-02 15:23:01.542953: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x560df22dbb30 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-09-02 15:23:01.543009: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host,289
Non-trainable params: 0
__________________________________________________________________________________________________
None
Prediction of 1 models took 1.556354284286499s
CLK0SU0M((141707
3.4778225421905518

如您所见，定时预测时间仍然是声称的预测时间的两倍。

有没有办法消除Calamari-ocr文本检测引擎中不必要的时间延迟？

如何解决有没有办法消除Calamari-ocr文本检测引擎中不必要的时间延迟？

相关推荐