Approach to the Improvement of the Text Line Segmentation by Oriented Anisotropic Gaussian Kernel

D. Brodic University of Belgrade, Technical Faculty Bor, V.J.12, 19210 Bor, Serbia, phone: +381 30 424 555, e-mail: dbrodic@tf.bor.ac.rs Z. N. Milivojevic Technical College Nis, Aleksandra Medvedeva 20, 18000 Nis, Serbia, phone: +381 18 588 211, e-mail: zoran.milivojevic@jotel.co.rs D. R. Milivojevic Mining and Metallurgy Institute, Department of Informatics, Zeleni bulevar bb, 19210 Bor, Serbia, phone: +381 30 435 109, e-mail: dragan.milivojevic@irmbor.co.rs


Introduction
Text line segmentation is a major step in a document analytic procedure.It is prerequisite for the valid optical character recognition (OCR) process.In addition, the text line segmentation and character recognition are dependent tasks as well [1].
There are a few successful techniques for printed text line segmentation.However, processing of handwritten documents has been remained a key problem in OCR [2,3].Most text line segmentation methods are based on the assumptions that distance between neighboring text lines is sufficiently large and text lines are reasonably straight.However, these assumptions are not always valid for handwritten documents.Hence, text line segmentation is a leading challenge in OCR.
Related work on text line segmentation can be categorized in few directions [1]: projection based methods, Hough transform methods, smearing methods, grouping methods, methods for processing overlapping and touching components, stochastic methods, and others.
Conventionally, text is written around the horizontal axis.Smearing methods exploited this text property.Hence, they smeared consecutive black pixels representing text along the horizontal direction.If the distance between the white space is within predefined threshold, it is filled with black pixels.The bounding boxes of the connected components in the smeared image which represents control image are considered as text lines.
Algorithm proposed in [4] model text line detection as image segmentation problem by enhancing text line structure using a Gaussian kernel and adopting level set method to evolve text line boundaries.Firstly, it converts a binary image to gray scale using Gaussian window, which enhances text line structures.Further, text line is extracted by evolving an initial estimate using the level set method.Specified method proved to be robust for different languages under reasonable variation of the skew angles, character sizes, and noise.Still, rotating text by angle of 10° or more has an impact on reference line hit rate.Similar approach for text image segmentation is partly exploited in [5].Some theoretical extensions are given in the form of Gabor transformation [6].Further, original method is intensively investigated in [7].
In this paper, the approach to the oriented anisotropic Gaussian kernel is proposed.Firstly, preprocessing is made by creating bounding boxes over each text object.According to this, its text object skew is evaluated by moments.These values determine the text orientation.Hence, they are inputs into an oriented Gaussian Kernel based algorithm.Furthermore, the benefits of the rotated over non-rotated Gaussian kernel for text line segmentation are also investigated.It is evaluated by printed and handwritten text samples.At the end, algorithm parameters optimization linked with Gaussian kernel are proposed.
The article is divided into different Sections: Section 2 describes algorithms i.e. algorithm based on non-rotated as well as rotated Gaussian kernel.Section 3 defines testing process and shows measurement results.Section 4 presents comparative analysis.Section 5 makes conclusions.

Proposed algorithm
Document Text Image.At the beginning of the process, continuous image is used.Document text image is obtained as product of image scanning.Document text image is digital text image represented by matrix D with M rows, N columns, and intensity with L discrete levels of gray.L is the integer number from the set {0, …, 255}.Currently, D(i,j)  {0, …, 255}, where i = 1, …, M and j = 1, …, N.
After applying intensity segmentation with binarization, intensity function is converted into binary intensity function given by: where D th is given by Otsu algorithm [8] or equivalent algorithm [9][10][11].It represents threshold sensitivity decision value.Currently, document text image is represented as binary matrix B featuring M rows by N columns.Consequently, it consists of the only black and white pixels where value 0 represents black pixels and value 1 white pixel.Illustration of handwritten text binary image is shown in Fig. 1.

Fig. 1. Binary document image
Anisotropic Gaussian Filter.Establishing distinct regions that mutually separate text lines is the primary task of the optical character recognition (OCR) algorithm.
In this paper, algorithm based on the analogy with Gaussian probability density function (PDF) is established.The function is given by [12] where x and  are column vectors and  is covariance matrix.For 2-D column vector x is given as Furthermore, covariance matrix  is given as while its determinant || as In ( 5) and ( 6) σ x and σ y represent the standard deviation defining curve spread parameter and  x and  y represent the mean in x and y direction, respectively.Further, if σ x = σ y this is isotropic Gaussian PDF, elsewhere it is anisotropic.However, ( 2) is starting point for creating anisotropic kernel.Hence, converting Gaussian PDF into point-spread function (PSF) creates anisotropic Gaussian kernel.
The idea of Gaussian smoothing is to use this 2-D distribution as a PSF.Since the image is stored as a collection of discrete pixels we need to produce a discrete approximation of the Gaussian function G(x) named G(i,j) prior to the convolution.However, the Gaussian distribution is non-zero everywhere, which would require an infinitely large convolution kernel.In practice, it is effectively zero for more than about 3σ x and 3σ y from the mean in x and y direction, respectively [7].These values represent Gaussian threshold sensitivity level L gtsx and L gtsy .It truncates 3σ x in x direction and 3σ y in y direction of the kernel forming the ellipse.All pixels which are inside ellipse form the same region with level higher than L gtsx or L gtsy .Hence, anisotropic Gaussian kernel G(i,j) is defined by 2P+1 in x and 2R+1 in y directions.
Currently, all these pixels are converted into the same regions thus forming boundary-growing regions.They represent control image with distinct objects.These objects are prerequisite for the text line segmentation.Furthermore, matrix X is created by convolving the isotropic Gaussian kernel G with the image represented by binary matrix B as follows [13] where i = P, …, M-P and j = R, …, N-R.Further, elements of matrix X are obtained as follows: Oriented Anisotropic Gaussian Filter.Oriented anisotropic Gaussian kernel forms extended anisotropic Gaussian kernel E. Hence, proposed kernel extension is made by rotating anisotropic Gaussian kernel G for the angle .Due to the nature of rotation, kernel is extended in x direction and diminished in y direction.This can be described as follows where T represents transformation matrix given by [14] Angle  is obtained by binary moments   [15,16].
Firstly, to localize text object orientation, all text objects are separated into different bounding boxes [17].In such way, text image is split into small text objects with different attributes.Currently, for each text object the binary moment representing its orientation is calculated.It presents the evaluated orientation of each of local text's objects.Illustration of this preprocessing procedure is shown in Fig. 2. Furthermore, new kernel dimensions are given as 2S+1 in x and 2T+1 in y direction, respectively.Difference between two kernels is shown in Fig. 3.

Fig. 3. Anisotropic and rotated Gaussian kernel
Main difference between original algorithm [4] and our approach is in text segmentation domain.Currently, matrix Y is defined by convolving the rotated anisotropic Gaussian kernel E with the matrix B as follows [13]   where i = S, …, M-S and j = T, …, N-T.
Further, elements of matrix Y are obtained by the binarization process as follows: The illustration of above text line segmentation procedure is shown in Fig. 4.

Experiments and Evaluation
Evaluation of the text line segmentation consists of testing samples based on IAM handwriting database [18] and extended by own test framework [19].Testing of the algorithm represents the process of the applying algorithm to the proposed text samples.As the implication of the test, the new growing region around the text is arisen, this process leads to new text objects configuration.In an ideal circumstance the number of newly arranged objects corresponds to the correct number of text lines.To make valid algorithm evaluation following text elements should be defined [20]: ).Split lines error represents the text lines that are wrongly divided by algorithm in two or more components, i.e. text objects.This circumstance is known as oversegmentation.Joined lines error corresponds to the situation where the sequence of n consecutive lines is considered by the algorithm as a unique line.This phenomenon is called under-segmentation.Lines including outlier words correspond to lines containing words that are incorrectly assigned to two adjacent lines.All of these circumstances are illustrated in Fig. 5.The algorithms efficiency means the evaluation of the text line segmentation process made by investigated algorithm.If the number of detected objects is closer to the number of referent objects, then the algorithm is more efficient.Following elements are introduced in order to evaluate the algorithm's efficiency:  Segmentation line hit rate, i.e.SLHR;  Over-segmentation line hit rate, i.e.OSLHR;  Under-segmentation line hit rate, i.e.USLHR;  Mixed line hit rate, i.e.MSLHR;  Segmentation root mean square error (RMSE), i.e.RMSE seg [19].SLHR represents the ratio of the number of correctly segmented text lines over the total number of text lines in the referent sample text.It is defined as The over-segmentation phenomena lead to the increased number of objects per text line.Hence, boundary growing area made by algorithm hasn't been successful in merging all objects of the text line into one.
OSHLR represents the ratio of the number of oversegmented text lines over the total number of text lines in the referent sample text.It is defined as The under-segmentation process leads to the smaller number of objects than the number of text lines.Hence, two or more consecutive text lines are considered as a unique one.USHLR represents the ratio of the number of under-segmented text lines over the total number of text lines in the referent sample text.It is defined as The process of mutually injected objects from different text lines leads to the mixed text lines.MSLHR represents the ratio of the number of mixed text lines over the total number of text lines in the referent sample text.It is defined as At the end, the number of detected and referent text objects per each text line are compared as well.Hence, the number of referent text objects per line is equal to 1.The variance evaluation is given by RMSE seg [19] 2 , , 1 1 ( ) where N stands for the total number of lines in the referent sample text, O i,ref for the number of referent objects in the text line i (equal to 1 per each line), and O i,est for the number of detected objects in the text line i.

Results and comparative analysis
In [7], optimized parameter set for the text attributes by [22] is proposed.This set is given by two parameters: P and , where 2P + 1 represents x dimension of the Gaussian kernel.Consequently,  represents the ratio of the y and x dimension of the Gaussian kernel, i.e.  = R / P. Furthermore, it is rotated for the angle .The results for the test framework [18][19] which consist of 576 text lines with different scripts: Roman Latin, Cyrillic, Glagolitian and Bangla is given in Table 1.-Table 4.    16.67 0.00 0.00 0.00 0.00 0.00   8.33 0.00 0.00 0.00 0.00 0.00    0.00 0.00 0.00 4.17  In Fig. 6 and 7 USLHR as well as SLHR improvement are shown, respectively.Under-segmentation process is unwelcomed.Hence, it should be avoided by any means.The bigger value of the parameter P triggers it.Hence, for the proper selection of the parameter careful decision-making is needed.
Consequently, decision of the P value is mandatory for the efficient of the algorithm for the text line segmentation.However, careful examination of the results presented in Fig. 7 lead to the conclusion that using oriented Gaussian kernel will reduce it.Accordingly, any orientation of the kernel toward the referent value is welcomed.

Conclusions
In this paper, an approach to Gaussian kernel algorithm for text line segmentation is presented.The proposed improvement method assumes creation of boundary growing region around text based on Gaussian kernel algorithm extended by incorporating the local orientation.Those growing regions form control image with distinct objects that are prerequisite for text line segmentation.Algorithm quality and robustness is examined combined test framework.Results are evaluated by proposed method.All results are presented as well as compared with non-oriented anisotropic Gaussian kernel method.Furthermore, comparative analysis and discussion is made.The strength of this approach in text segmentation domain is mandatory.Its improvement is based on the expansion of the growing regions under specified angle around the text.Still, careful decision-making about choosing adequate parameter values is necessary.

Fig. 2 .
Fig. 2. Bounding boxes over each separated text object linked with its evaluated local orientation (red lines)

Fig. 4 .
Fig. 4. Text segmentation by algorithm based on oriented anisotropic Gaussian kernel.
Initial objects number O init ;  Detected objects number O det ;  Referent objects number O ref .Initial objects O init represent the starting number of objects in referent sample text.It is calculated as the counted number of text objects in the starting sample text.After applying the algorithm over sample text, the number of text objects is changed.Consequently, many text objects are mutually merged by the influence of the text segmentation algorithm.Currently, the number of text objects is given as the number of detected objects O det .Obviously, the number of detected objects should be lesser than or equal to the number of initial objects, i.e.O det ≤ O init should be valid.Basically, in the initial sample text there is distinct number of text lines.The task of the text segmentation algorithm is to segment text lines hitting or missing this number of lines.Hence, this number of real text lines should be represented as the target number in referent sample text.It is called referent number of objects O ref .The algorithm efficiency is evaluated by comparison of the referent and detected number of objects per each text line.If the number of text objects in distinct text line is equal to one, then O det = O ref leading to correct segmented text line.The number of correctly detected text lines in sample text is marked as O corrlindet .However, all others are defined as error.Segmentation errors are present in the following circumstances:  Over-segmentation detected text lines O overlindet (Split lines error i.e.SLE [21]);  Under-segmentation detected text lines O underlindet (Joined lines error, i.e.JLE [21]);  Detected text lines with mutually inserted words from different text lines O mixedlindet (Lines including outlier words, i.e.LIOW [21]

Fig. 5 .
Text line segmentation: (a) Original text, (b) Original text with referent objects i.e. correctly segmented text lines, (c) Oversegmentation text lines, (d) Under-segmentation text lines, and (e) Text lines with mutually inserted words from different text lines.

Fig. 7 .
Fig. 7. Application of the non-oriented (red line) and rotated Gaussian kernel (green and blue line)

Table 1 .
-Table 4 P 1 , P 2 and P 3 (P 1 < P 2 < P 3 ) represent different values of parameter P, while kernel is rotated for the angles 0,   /2 and   .Furthermore, SLHR results improvement among different kernel rotation angles   /2 and   compared to non-oriented kernel is given in