Prediction of the Optical Character Recognition Accuracy based on the Combined Assessment of Image Binarization Results

Authors

  • Piotr Lech West Pomeranian University of Technology
  • Krzysztof Okarma

DOI:

https://doi.org/10.5755/j01.eee.21.6.13764

Abstract

In the paper the problem of reliable evaluation of the effects of image binarization is discussed in view of image recognition accuracy. Considering the Optical Character Recognition methods, typically used for document images obtained by cameras or scanners, their accuracy is strongly dependent on the results of image binarization. Unfortunately, metrics typically used for the evaluation of binarization results, such as Peak Signal to Noise Ratio, Distance Reciprocal Distortion or Misclassification Penalty Metric, are not always well correlated with the recognition accuracy of individual characters. Therefore, a novel approach related to the use of combined metric for the assessment of binarization results is proposed and verified for the binary images obtained using some popular histogram-based methods from the original images with degraded quality. For the experimental prediction of the character recognition accuracy, the popular open source engine supported by Google, known as Tesseract, has been used.

DOI: http://dx.doi.org/10.5755/j01.eee.21.6.13764

Downloads

Published

2015-12-04

How to Cite

Lech, P., & Okarma, K. (2015). Prediction of the Optical Character Recognition Accuracy based on the Combined Assessment of Image Binarization Results. Elektronika Ir Elektrotechnika, 21(6), 62-65. https://doi.org/10.5755/j01.eee.21.6.13764

Issue

Section

SYSTEM ENGINEERING, COMPUTER TECHNOLOGY