Amber Gemstones Sorting By Colour

The objective of this study is to create computer vision algorithms for autonomous multiclass identification of amber nuggets by their colour. By applying the proposed methods an automated production sorting system has been developed. This system can be used, for example in combination with conveyor systems, and in any other case that requires distinguishing objects of many classes in a high-rate flow of objects. In order to achieve this, the proposed system operates with colour features selection, algorithm for classifier training, grouping, and voting with reject option have been developed. The developed system has been used in an automated amber sorting line to increase the quantities of sorted amber nuggets. The applied algorithms gave 88.21 % as the highest accuracy for the amber nugget expert database consisting of 30 classes. DOI: http://dx.doi.org/10.5755/j01.eie.23.2.17993


I. INTRODUCTION
Amber is a natural mineral formed from fossil tree resin.Nowadays amber from the Baltic Sea is still extracted and used to create impressive jewellery, souvenirs, mosaics and fine art crafts.In order to provide the artcraftmans with the adequate raw materials, amber nuggets are selected according to their hues, transparency, and size (Fig. 1).The resent work aims to develop and study image analysis algorithms, to be used to identify the belonging class of the amber nugget on a conveyor.The expected result of the multiclass identification can be that the presented object belongs to an identified class or that the presented object cannot be identified within an acceptable accuracy.For this reason, the focus of this development is to ensure high identification accuracy.Based on samples provided by experts, several algorithms for identification of multiclass Manuscript received 9 October, 2016; accepted 2 March, 2017.items by their visual properties have been created.
Certain researchers have achieved good results by extracting visual features and used them for classification and sorting task into small number of categories.First and second order statistical features were used [1] for the surface visual evaluation.Concerning sorters, they are usually used to separate objects in food industry [2], waste recycling operations [3], and inspection [4].Such systems are based on optical properties of object's surfaces, by using different types of sensors for example CCD cameras, spectroscopy, stereo vision, infrared light, among others.Optical sensors are able to acquire colour [5], shape, texture, and other optical features [6] and in many cases it is a multiclass [7] identification problem.Optical properties relay on lighting condition, which makes isolating objects from environment and implementing artificial lighting source some of the most important key points of the system to work properly.Such systems have strict requirements for their performance.
The classification of colour images plays an important role ensuring the quality of high-speed automation in industry.By employing histogram thresholding techniques, pixel counting, different types of lighting and excluding edges or colour tones of searched objects, it is possible to obtain good classification accuracy [8].However, it is obvious that classification of single type objects is a more complicated task than identification of similar objects in the same image.

II. THEORETICAL BACKGROUND
The practical problem of the study involves finding the pixels of object and acquiring its edges for further analysis by using segmentation methods.
After the objects are separated from the background, extraction of object features can take place.This includes measurement of object's colour characteristics, which can be used for model training or object identification.Colour descriptors may be divided [9] into three main groups: global descriptors, descriptors based on fixed-size regions and segmentation-based descriptors.Global descriptors describe the image colour globally.Fixed-size region based descriptors divide the image into fixed-size cells (regions) and extract colour information from each region separately.Rao [10] has identified five potential methods for making regions (Fig. 2): overall image, rectangular, angular, circular and hybrid.Segmentation-based descriptors divide the image into regions that can be different in size and number depending on the image.This study uses classifiers with a supervisor.In general, this type of classifiers identifies objects by attributing them to a finite set of classes.Classification is carried out by comparing measured features of new objects with objects or criteria already known; then it is decided to which category of objects the new object belongs to.The classifier used for solving multi-class problem in sorting system is decision tree (DT) [11].
The best model is selected from several available models.The selection criteria used are complexity and generalization capacity of the model.These indicators are expressed through the confusion matrix.
Usually, higher identification accuracy is ensured by combining models into groups and introducing the class of rejectables.Instead of choosing a single model, several models are combined into a collective, who performs better than any model separately: Half&Half method or random forests [12].To reduce the number of false identifications, the class of rejectables is often introduced in automated decision making systems.If the identification result is considered as insufficiently reliable, it is rejected.is known as identification with reject option [13].

III. EXPERIMENTAL SETUP
Figure 3 shows the main components of the experimental base.Amber nuggets fall down from the vibrating bowl feeder onto white conveyor.The laser fork detects the amber nuggets that interrupt the laser beam and sends a signal to the digital camera (type FFMV-03MTC, mfg.Point Grey, Canada) which captures the top image of the nugget on the conveyor.The original number of images is 9,008.After rejecting defective samples, 8,479 images remained in database.Images are normalized before feature extraction.The database consists of 30 classes (Fig. 4).Samples in the database are tagged by experts according to colour into five main groups.These classes may be called as follows: clear and semi-clear (classes 1-8) -transparent with a yellow hue and some acceptable opacity; opaque (classes 9-17)opaque with small transparent areas; crystallized (classes 18-22) -mainly opaque with many bright-coloured spots; whitish (classes 23-26) -whitish with black, brown or yellow-colour intrusions; darkish (classes 27-30) -black or dark-coloured with small bright areas.

IV. METHODS
Features are extracted from regions and obtained by dividing the object image into concentric rings of equal area starting from the object centre (Fig. 5).The identification is independent of the transfer, rotation angle and reflection.For this database, four statistics are computed from each concentric ring: mean  , standard deviation  , kurtosis k and skewness s .These statistics are calculated for each component of the colour space (H, S, and V), and transform (range filter for V component, contours of S component) separately.This enables evaluation of colour and texture features.One object may have from one to a several hundred regions.Feature vector of each region is stored as one line in the matrix (1), where the number of lines in it corresponds to the number of rings on the object where N is the number of regions; H, S, and V are HSV colour space channel matrices of a region, VDF is the matrix obtained by computing range filter from V component of HSV colour space, SK is the matrix obtained by computing contours from S component of HSV colour space.Feature type is computed for each dimension separately; therefore, feature matrix colour X length is equal to 20 values.The training model algorithm goes as follows: 1.For each object in the training set, features were extracted as matrices 1  , , , The index with the highest value in a vector represents the item number of the winning class.Class of rejectables is formed by using the following formula: max( ( )) / ( ( )) 100 , where th is the percentage of the threshold representing the minimum number of votes a class needs to become the winner.If the number of votes is insufficient, the object is marked as unidentified.Threshold is determined empirically; it corresponds to the minimal total number of identification and rejection errors.

V. RESULTS
As a result of a sequential feature selection (SFS) [14], the most useful solution proved to be combining global feature systems: First Order Statistics (FOS), Contour First Order Statistics (CFOS) and Local Range Filter (LRF).The types of the FOS features selected by SFS are mean, standard deviation, kurtosis and skewness.The latter feature types are computed for all components of HSV colour space, for range filter of HSV colour space V component and for contours of HSV colour space S component.By combining those features, accuracy increases up to 71.53 % and it is higher than the value achieved by using any feature system separately.Those features are later used for researches with different regionalization methods and are further referred to as selected features.
Accuracy is improved by using regionalization (Fig. 6), when features are computed from each region separately rather from complete object pixels.Fixed-size regionalization means that certain parameters of the region size are fixed; for example: width for strips, angle for angular, width and length for rectangular.Surface area of the regions obtained by the regionalization methods mentioned above may be different.The obtained results show that by increasing the number of regions and reducing their surface area results in around 10 % higher identification performance; however, circular regionalization has shown the best result of 80.06 %.In this case, selected features (FOS, CFOS, and LRF) are computed for each region separately instead of complete object.
Three fixed-area regionalization methods have been selected (Fig. 7).The first one operates by taking pixels successively and breaking down the amber nugget surface into strips.The second one uses a mesh of equally-sized squares positioned in all directions from the centre.The third method uses equally-sized concentric rings from the same centre, which is at the gravity centre of sample.The obtained results showed that circular fixed-area regionalization gives the highest accuracy for amber nugget identification.If compared to the fixed-size regionalization, fixed-area method performs better in all cases.Rectangular: 73.87 % instead of 71.59 %; strips: 79.25 % instead of 77.00 %; circular: 81.67 % instead of 80.06 %.Fixed-area regionalization is better than fixed-size regionalization.
When only FOS features computed from HSV colour space components are used for fixed-area concentric rings, identification accuracy is around 78.35 % (Fig. 8(a)); however, a significant number of errors appear among opaque (classes 9-17) and crystallized (classes 18-22) amber nuggets.The reason is that statistic features for colour spaces cannot evaluate these textures, because crystallized amber nuggets have small bright dots while opaque nuggets have bright areas.Features computed from the range filter of HSV colour space V component and contours of HSV colour space S component ensure accuracy of only 44.01 % (Fig. 8(b)); however they distinguish classes 9-22 significantly better.Combining features of colour spaces and their transforms increases accuracy by more than 4 % up to 81.67 %; moreover, opaque and crystallized amber nuggets are distinguished significantly better (Fig. 9(a)).
Classifier types with features computed for the complete amber segment have been compared.Types DT and pruned DT show the highest performance with accuracy values of 71.96 % and 72.88 % accordingly.In order to improve the accuracy of individual classifiers, they are combined into a collective for taking a common decision.The highest accuracy values have been achieved using DT (H&H) (77.90 %) and random forest (79.53 %) collectives.Accuracy of the collective increases even further, when the amber nugget is regionalized.Accuracy is calculated by making a feature vector for each region separately and then voting.The best result of 84.67 % was achieved (Fig. 9(b)) by using a collective formed by 15 DT classifiers trained by H&H method.In the case of concentric ring method, the region area is fixed and equals to 400 pixels.threshold increases general accuracy (Fig. 10).Rejection threshold is computed as a percentage of the votes collected by the winning class compared with the total number of votes.The total number of identification and rejection errors represents compromise function.Threshold finding procedure involves minimising of this function because both identification errors and false rejections are useless.The obtained results show that the accuracy rises from 84.67 % to 88.21 % when threshold value is 35 %.

Fig. 1 .
Examples of art articles from amber.
object classes; 2. All feature matrices are combined vertically into colour Z matrix, which has the number of lines corresponding to the total number of regions in all examples from the training set.Class labels are combined into vector GR; 3. Combined feature matrix and group vector are used for training collective the number of classifiers in the collective.The classification algorithm goes as follows: 1.The model is tested using test set.At first, colour X features of the object being tested are extracted; 2. The collective classifiers vote for each region; 3. It results vote vectors 1 indicate region item number and the j i SC value is the class label assigned to the i-th region, when there are P regions in total, and the vector is formed by the j-th classifier; 4. All votes are added to a vector; it represents the total number of regions that were labelled throughout all the classifiers as belonging to corresponding classes: 1..V show class item number and i VC value represents the number of votes for the i-th class; 5.The winning class C is determined according to the principle of majority voting: