Prediction of the Optical Character Recognition Accuracy based on the Combined Assessment of Image Binarization Results
In the paper the problem of reliable evaluation of the effects of image binarization is discussed in view of image recognition accuracy. Considering the Optical Character Recognition methods, typically used for document images obtained by cameras or scanners, their accuracy is strongly dependent on the results of image binarization. Unfortunately, metrics typically used for the evaluation of binarization results, such as Peak Signal to Noise Ratio, Distance Reciprocal Distortion or Misclassification Penalty Metric, are not always well correlated with the recognition accuracy of individual characters. Therefore, a novel approach related to the use of combined metric for the assessment of binarization results is proposed and verified for the binary images obtained using some popular histogram-based methods from the original images with degraded quality. For the experimental prediction of the character recognition accuracy, the popular open source engine supported by Google, known as Tesseract, has been used.
How to Cite
The copyright for the paper in this journal is retained by the author(s) with the first publication right granted to the journal. The authors agree to the Creative Commons Attribution 4.0 (CC BY 4.0) agreement under which the paper in the Journal is licensed.
By virtue of their appearance in this open access journal, papers are free to use with proper attribution in educational and other non-commercial settings with an acknowledgement of the initial publication in the journal.