Influence of Image Enhancement Techniques on Effectiveness of Unconstrained Face Detection and Identification

In a criminal investigation, along with processing forensic evidence, different investigative techniques are used to identify the perpetrator of the crime. It includes collecting and analyzing unconstrained face images, mostly with low resolution and various qualities, making identification difficult. Since police organizations have limited resources, in this paper, we propose a novel method that utilizes off-the-shelf solutions (Dlib library Histogram of Oriented Gradients-HOG face detectors and the ResNet faces feature vector extractor) to provide practical assistance in unconstrained face identification. Our experiment aimed to establish which one (if any) of the basic image enhancement techniques should be applied to increase the effectiveness. Results obtained from three publicly available databases and one created for this research (simulating police investigators’ database) showed that resizing the image (especially with a resolution lower than 150 pixels) should always precede enhancement to improve face detection accuracy. The best results in determining whether they are the same or different persons in images were obtained by applying sharpening with a high-pass filter, whereas normalization gives the highest classification scores when a single weight value is applied to data from all four databases.


I. INTRODUCTION
As one of the main biometric traits, the application of face recognition is of increasing importance in the areas of video surveillance, access control, human-computer interaction, border control surveillance, and crime investigation [1]- [3]. The application in a criminal investigation to identify the perpetrators of crimes, which is of particular interest for this paper, stands out primarily because of the emphasis on security in modern society [4]. There are two decisive reasons for the importance of the application of face recognition in criminal investigations. First, the human face carries different information about identity, age, gender, race, emotions, and mental states [1]. Manuscript  Second, the face recognition does not require the cooperation of a person and can be performed unobtrusively. Both reasons give the application of this biometric method an advantage in analysing video and photo surveillance in the search for known perpetrators, then for analytical and forensic purposes in the detection and prosecution of defendants and discovering missing persons, primarily children [5]. However, the negative consequences of misinformation obtained from the face recognition system can directly impact citizens' lives and human rights. Therefore, the criminal investigator has a decisive role in the process of identification of persons. Information systems and software tools that process a large amount of data can only give suggestions. Criminal investigators expect such information systems and tools to provide them with a comprehensive analysis of the evidence gathered with the essential technical requirements, as accurate as possible suggestions. In addition, investigators expect the systems and tools to help them make the final decision providing enhanced images on which suggestion is made (more suitable for visual inspection by the investigator). Information systems and tools base the process of face recognition on recognizing its general patterns achieving the best results with photos of faces placed frontally. This process presents a challenge in unconstrained conditions where the recognition rate could drop dramatically using standard technologies [6]. The problems are in the following factors that occur in such situations [1], [2], [7], [8]:  Factors that can cause variation in the appearance can be intrinsic (depending on the face's physical structure, such as expression, age, and occlusion) and extrinsic (depending on illumination, scale/resolution, pose, noise, and blur);  Factors influencing the similarity in face appearance between different individuals, such as kinship and face manipulation;  Factors related to the technical characteristics of the sensors, such as low resolution.

Influence of Image Enhancement Techniques on Effectiveness of Unconstrained Face Detection and Identification
Unconstrained conditions are the main characteristics of video and photo material that criminal investigators usually process to identify the perpetrators of criminal acts. Sources of images as digital evidence today can be the product of covert police surveillance, digital forensic reports of confiscated mobile phones and computers, and the material produced in open-source intelligence (OSINT) work. In this research, we experimentally determined the impact of the visual improvement of images on the process of face recognition, i.e., identification. Face identification is a process of comparing faces with all feature templates stored in the database (computation a one-to-many similarity) to find the subject identity among several possibilities that match the most [2], [9]. We perform an experiment using the Dlib version of the Histograms of oriented gradients (HOG) descriptor for face detection and ResNet to create a 128-dimensional face features vector. Dlib is a C++ toolkit that includes machine learning algorithms for face detection and a feature point algorithm [10]. It uses ResNet, a deep residual network first proposed by Kaiming He, for 128dimensional face features vector extraction. The particular experimental setup offers face identification with minimal requirements for resources in data processing and storage. The main reason lies in the practical applications of such systems and tools in a criminal investigator's environment commonly faced with limited resources. The features vector of 128 elements can be stored in a database and used for comparison during identification. It is important to emphasize that amount of comparisons is further proportional to the size of the database.
Many papers deal with the preprocessing of images in face recognition, not necessarily in the way images are visually enhanced. The following are examples that represent groups of such research. One example is when the preprocessing is intended for the appropriate technologies applied, so the conversion of color photography to black and white (shades of gray) is performed due to the use of Gabor filters to extract features that represent the face [4]. Many comparative studies also measure the impact of different illumination preprocessing techniques on face recognition performance, concluding that such techniques give almost perfect results in controlled lighting variations and that better visual preprocessing results do not guarantee better recognition accuracy [11]. Another way to approach preprocessing is to replace missing image information with existing ones, e.g., creating an approximately symmetric virtual face image using the least degraded half of the original face image (for example, half of a non-shaded face makes an entire face) [7]. In this way, preprocessing reduces the negative effects of heterogeneous lighting. On the other hand, it creates a set of samples for face recognition instead of using original photographs, which solves insufficient samples of appropriate quality for matching. Similar to the previous example, using existing information and trained intelligent and expert systems, methods are proposed to normalize variations in poses [12]. In some research, image preprocessing is suggested, yet improvement in recognition rate is achieved later in the process, for example, combining two descriptors that merge at the classifier level [9], [13]. This paper's contribution is reflected in determining whether the face detection and identification effectiveness will increase regarding the application of specific enhancement techniques on unconstrained images (emphasizing the ones obtained in police investigation work). To simulate the material created due to forensic reports and covert police surveillance, a particular set of data was made for this paper's needs as an additional contribution. When comparing the effects of various image improvements during the face identification, we considered unbalanced sets of positive and negative comparison results. Unbalanced sets of matching results correspond to the actual conditions of potential application within criminal information systems and tools. In this way, the results have a broader use-value in applying HOG descriptors, face features vectors and other methods of recognizing faces outside the domain of criminal investigations. The rest of the paper is organized as follows. In the next section, the materials and methods we use in the experiment are described. Then experimental setup is explained, followed by a discussion of obtained results and the conclusion we have made.

A. Databases with Labelled Face Images
The experiment was conducted on four labeled image databases, three of which are publicly available.
The fourth was created for this research only (internally called SSIM) to simulate the material obtained by digital forensics units or covert surveillance police operations. It contains 6,027 face images of 328 different persons. For most of them, the photos were taken in a period longer than 15 years, resulting in significant physical appearance changes due to aging, as shown in Fig. 1. Face images vary in resolution from 31×31 pixels to 2385×2385 pixels. The quality of images is different, which, in addition to the various resolutions, is another essential feature of pictures collected for criminal purposes. It is necessary to point out that apart from a few professional ones, all other images are created with amateur or consumer equipment. The lighting is mainly artificial and placed frontally. Due to the lower intensity of the light source, the faces are in the shadow. Image sharpness is shown to be generally satisfactory for visual recognition, but images that are more or less out of focus predominate. Although frontfacing is dominant, there are different poses, with many photos with semi-profiles and profiles.
The IMDB-WIKI database [14] contains 523,051 photos of celebrities collected by crawling from IMDb and Wikipedia websites. The database is interesting for our research because it contains photographs with significant differences in the age of the same persons (from childhood to adulthood). Because faces belong primarily to actors, the transformations in appearance due to trends or movie roles often make recognition a challenge. Images contain scenes from films that can be said to simulate unconstrained conditions. In addition, the pictures vary in resolution. The authors of this database downloaded the information about the name, date of birth, gender, and all images related to a person from IMDb site celebrities' profiles. In the caption of the site pictures, there is a date when the picture was taken. It was assumed that when there is one photo on a specific person's profile, it is probably a photo of the person whose profile it is. In contrast, when it comes to pictures with more than one person, they only use images where the second strongest face detection is below a threshold. The set's analysis revealed problems with the wrong faces and assigned data on the years when the images were created, which required correction by hand. To deal with the most accurate data from IMDB-WIKI, we took only photographs that we could confirm to be the correct person.
The Extended Yale Face Database B is a set that initially contained 16,128 photos of the faces of 28 different people under nine poses and 64 illumination conditions. The authors created a database for the systematic testing of face recognition methods under extensive variations in illumination and poses, without the influence of different facial expressions, aging, and occlusion [15]. All face photos in the database have a resolution higher than 150 pixels.
The Labeled Faces in the Wild (LFW) database provide a large number of face images taken under various conditions, starting with the newspaper photographs depicting people in different poses and the facial expressions under different lighting [16], [17]. The photos resolution is mainly less than 150 pixels, which corresponds to images police investigators collect by OSINT. The database contains 13,233 images belonging to 5,749 different people, of which 1,680 people have two or more pictures.

B. Image Enhancement
For the experiment, to meet the operational needs of the police work, we used the basic enhancement operations: contrast-stretching (normalization), sharpening, high-pass filtering, reducing the speckles within a photo, and resizing, as well as a combination of these operations. Enhancement is automated using the convert application belonging to the ImageMagick 7.0 set of tools. It is a free software that over a command-line interface creates, edits, composes, and converts raster graphics in many different formats, using multiple threads for a calculation to increase the performance [18]. Every enhancement starts with the unprocessed images (UNP) of the four databases.
As previously stated, one of the main characteristics of the used databases is different image resolution, which distribution over the databases is shown in Fig. 2. It can be noticed that the number of images with a resolution lower than 150 pixels is uneven within the databases, but in total such images participate to a sufficient extent that the influence of resizing can be significant. We used the convert application and -resize option to increase all face images' size smaller than 150 pixels. The data processed in this way we marked with R, or if it is a combination of different preprocessing methods in question, R is on the beginning. The normalization (N) represents the increase in contrast in an image by stretching the range of intensity values. It is necessary to determine the lower and upper pixel value limits. Based on the previous setting of the operator of the convert command, the lowest pixel intensity value will be 2 % of the black-point values and the highest value will be 99 % of the white-point values. The operator uses histogram bins to determine the range of color values that needs to be stretched. Figure 3 and Figure 4 show the raw image and the image after resizing, and normalization is applied. Operator -unsharp (S) of the application convert sharpens an image by performing convolution with a Gaussian operator of the given radius and standard deviation (sigma) [18]. The selected value of the standard deviation is 5, and the bigger number produces a sharper image. The radius is 0, which means that a suitable radius for the corresponding sigma value is chosen by the convert application [18].  A high-pass filter (HP) and resize and despeckle (RD) combines several different operators within the convert application to sharpen images and remove noise by blurring, respectively.

C. Face Detection and Feature Vector Extraction
The face recognition system generally includes: Face Detection and Extraction, Feature Extraction and Representation, and Feature Matching [2], [19]. One of the popular methods for face detection is Histograms of oriented gradients (HOG), proposed by Dalal and Triggs [20].
The HOG is a feature descriptor that is invariant to 2D rotations and scaling [2]. The idea behind the HOG descriptor is that the distribution of intensity gradients can characterize the shape of objects in photographs. It is sensitive to the slightest changes in shape unless the whole object is consistent [21].
To obtaining a HOG descriptor, the image needs to be divided into blocks of pixels, i.e., cells. A histogram of the edges' orientation, i.e., gradient directions, for each cell is calculated. The normalization is performed using a cell block pattern, which gives a HOG descriptor for each cell and then the whole image's feature vector. Further, the feature vector is used for the face detection and recognition [21]. The first step is to calculate the amplitude of the gradients of each pixel of the image I(x, y) in the horizontal and vertical directions, based on which the magnitude of the gradients |G| and the orientation angle γ are obtained [13], [22], [23].
The following formulas are used to calculate the magnitude and orientation: Split of the orientation range into k bins, computation of the histogram within the cell (HC), and then integration into the block by combining b1 × b2 cells, we can obtain the histogram of the block HB [22]: The orientation histogram of each cell is constructed by accumulating the gradient magnitudes related to the corresponding class interval concerning the corresponding orientation angle [13]. In addition, the blocks overlap so that each cell contributes more than one block [24].
Due to variations in the image, such as brightness or contrast between foreground and background, the gradient values also vary significantly, which is why L2-norm block normalization is applied [22]- [25]   The NHB j is a normalized block, whereas e is a constant with the purpose of avoiding division by zero. The integration of the normalized blocks builds a HOG descriptor. In this paper, we used a Dlib version of a HOG face detector and ResNet. Although not currently state of the art, the Dlib HOG face detector's implementation is an off-the-shelf solution that achieves reasonable precision and processing speed [26], [27]. Figure 3 and Figure 4 also show a visual representation of the HOG, created using the skimage Python library [28], before and after enhancement. There are face contours with more information after normalization is applied.
The descriptor extraction transforms multidimensional arrays of discrete spatial units (pixels) into lower dimensions while retaining enough data to represent the information. The face is represented by the vector, which contains the unique features of the face [2].

III. EXPERIMENT
In everyday work, the police investigators routinely use different image processing software trying to obtain the enhanced images more suitable for visual inspection and decision-making regarding identity resolution. It is a tedious work requiring a lot of effort and time that they often do not have. The state-of-the-art solutions based on machine learning algorithms trained on huge databases to provide high face recognition effectiveness are expensive and not available for utilization within most police organizations.
Therefore, the basic idea behind our work was to design a novel system for collecting, storing, and analysing unconstrained face images in real-time with limited resources (regarding processing power and storage capacity) that will assist in police investigation work. To simulate limited resources, we used a consumer laptop with an Intel i3-5005U dual-core 2.0 GHz base frequency processor and 4 GB of RAM in our experiment.
The first step we took was to conduct an experiment to establish which one (if any) of the basic image enhancement techniques should be applied prior to the unconstrained face detection and identification to increase the effectiveness.
At the beginning of the experiment, we create a copy with a sharpened image, normalized, resized, etc., for each photo from the databases. Next, the HOG-based face detection is performed on the unprocessed images and all made copies followed by the feature vectors extraction. The last step of the recognition process consists of creating a working set representing the intersection of the sets of all obtained feature vectors, which will be considered in the matching process.
Only the faces detected by applying all the enhancement techniques were taken into account and further analysed during the experiment. In addition, the databases with significantly more images were reduced to the SSIM database size for a more uniform comparison. Based on that the SSIM database was reduced to 4,243 images, the IMDB-WIKI database was reduced to 3,011 images (330 different persons), the Yale B database was reduced to 3,069 images, and the LFW database was reduced to 3,803 photographs (showing 119 people). Figure 5 shows the course of the experiment. Images from each database are firstly processed with different enhancement methods previously described. Subsequently, the HOG and ResNet functions from the Dlib library were used to perform the face detection and face-feature vectors' extraction. The working set V WS is determined by comparing the vectors from the sets of images obtained after different enhancements, and it is equal to the values of the reduced databases. Further in the experiment, we use only the vectors obtained after all the performed enhancements, whereas the others are excluded. This part of the experiment is presented in more detail in Algorithm 1. The face matching is made using cosine similarity (distance), which is a measure of similarity based on each vector's component composition. We choose this similarity measure between two non-zero vectors because it is a naturally normalized distance where the outcome is bounded in [0, 1].
In the experiment, we used the confusion matrix to determine the relationship between the actual values and the prediction obtained by selecting the appropriate threshold [29]. Figure 6 shows the confusion matrix for a two-class classification.   We used histograms, such as the examples shown in Fig.  7 and Fig. 8, during the analysis of the experiment's data. The bin width h of the histogram represents the difference between the values of the previously defined maximum and minimum similarity limits divided by k, the chosen number of bins  (14) In Fig. 7, the uncertainty interval is shown by bins from 2 to 11, whereas in bin 1, TNrate is 100 % ( ). The uncertainty interval in Fig. 8, where the focus is on the true positives, is covered by bins from 1 to 10, whereas in bin 11, TPrate is 100 % ( ). The unique threshold or weight value (w) for the set of similarities related to enhancement applied over all four databases were calculated using the mean value (  ) and standard deviation (s): .

TP TN Acc TP FP TN FN
In the unbalanced sets, where the number of samples of one class is significantly higher than of another, accuracy can no longer be considered a sufficiently reliable measure. The accuracy too optimistically estimates the classifier performance for the majority class [31]. On a large scale of persons who need to be identified during the criminal investigation in actual conditions, a disproportionate number of different and the same persons are expected to favor different ones. Therefore, other classification quality measures have been applied in the paper, namely the F1 score and Matthews correlation coefficient (MCC). A standard measure used in unbalanced sets is F1, representing the harmonic mean of precision and recall, and it has the following form [31]: , The minimum value of the measure, i.e., F1 = 0, is obtained when the number of correctly positively classified samples is equal to zero, TP = 0. In contrast, the perfect classification, F1 = 1, is considered when the number of incorrect negative and positive classifications is equal to zero, FN = FP = 0. What is immediately noticeable is that the F1 measure does not consider accurate negatively classified predictions TN.
Additional verification of the quality of the classification was performed by the MCC, which considers all confusion matrix elements TN  FP  FN  MCC  TP  FP  TP  FN  TN  FP  TN Suppose MCC = 0; the classification result is equal to a random selection. A positive value indicates the correct classification (a value of +1 coefficient represents a perfect classification). In contrast, the negative values indicate worse classification than a random selection, where the value of a coefficient of -1 represents a wide discrepancy between the prediction and the concrete classes. The MCC is the only measure of the binary classification that generates a high score only when a particular model performs a successful classification regardless of which class is dominant [31].

IV. RESULTS AND DISCUSSION
The experimental results suggest that the application of different types of image enhancement impacts the total number of detected faces.
By counting the feature vectors from each set of enhanced images, we obtained the data shown in Table I and Fig. 9, representing the impact of enhancement on face detection effectiveness. The up arrow (↑) in Table I shows an increase in the number of detected faces by applying some of the visual enhancement methods compared to the unprocessed images. The down arrow (↓) indicates that the number of detected faces decreased after preprocessing, whereas the equals sign (=) indicates that the number of detected images did not change. Based on the markings in Table I, it can be concluded that the enhancement has a significant positive effect on the number of detected faces. The best results we achieved by applying sharpening after enlarging images with a resolution of less than 150 pixels. Figure 9 allows a clearer view of the overall impact on face detection's effectiveness concerning the samples from all databases that we used in the paper. The chart on Fig. 9 shows the mean value of the number of detected persons for all databases after min-max normalization. The photo sharpening gives the best results both independently (S, RS) and in combination with normalization (NS). Comparing the results of applying the same enhancement method with and without resizing in Fig. 9 shows that resizing images with a resolution of less than 150 pixels improves face detection.
The differences in the number of detected faces in Table I emphasize the need to measure the impact of image enhancement on the identification process's effectiveness based on a previously created working set.
In the simulation of the face identification applied in the experiment, we performed binary classification by comparing the faces from one image with all other faces from the working set. The working set is built on a specific image enhancement method within the database. Then the procedure is repeated for each remaining subsequent image. Figure 10 and Figure 11 show the result of the first step of the described face identification process relating to the same face image, before and after enhancement, respectively.  Fig. 9. Impact of enhancement on the total number of detected persons.  The similarity value indicating the same person is represented by dark dots in Fig. 10 and Fig. 11, whereas light dots represent different people's faces.
To determine the weight value w based on which the binary classification would be performed, the problem of determining the most optimal similarity  is encountered. The problem is visually presented as a shaded area in Fig.  10 and 11, which we called the "uncertainty intervals". The uncertainty interval is bordered by the threshold values above which all comparisons indicate the same face (all dark spots) and below which we can claim that all faces are different (all bright spots). In addition, the comparison of Fig. 10 and Fig. 11 shows how the uncertainty interval changes after image enhancement. In this particular case, the advantage of normalization to unprocessed images is noticeable. The continuing identification with other faces from the images produces new different intervals within the same set. The goal is to make the overall interval as narrow as possible and with fewer similarity values within the uncertainty interval.
By measuring the results based on face matching from all sets by different image enhancement methods and then normalizing the obtained values for a unified representation on the chart, we obtained Fig. 12. The chart shows the uncertainty interval's size ratio and the number of similarities, whose values can indicate both the same and different persons, i.e., they are within the interval. The length of the interval affects the number of similarities within the interval. The best results are given by applying high-pass filters with and without resizing (RHP and HP, respectively). Comparing all enhancement methods combined with resizing of face images does not affect the results to the extent that the processing with which it is combined has impacted.
The face identification in the actual application related to a criminal investigation is characterized by a disproportionate number of comparisons that give positives and negatives. We analysed the results from both perspectives separately in the experiment since sometimes it is crucial to be able to exclude possible suspects (TN) from further investigation, and in that manner, free occupied resources. The examples of the histograms shown in Fig. 7 and Fig. 8 were used for this purpose. For example, in Fig.  7, data from the bin in which TNrate is equal to 100 % were used, i.e., comparisons give correct results that they are different persons. In addition, the data from the first eight bins give a high degree of certainty that there are different persons on the matched images, over 99 %. We normalized data within the same database, and then the mean value was calculated for all four databases for each set of applied image enhancements. Figure 13 shows a chart of the impact of enhancement on effectiveness in the identification process where it can be argued that these are different persons. The best results in claiming that these are different images with 100 % certainty are obtained by sharpening using a high-pass filter (HP, RHP). The influence of different image enhancement methods at TNrate of 100 % fully corresponds to the comparison of the length of the uncertainty interval. When it comes to the certainty greater than 99 % that these are different persons, which also covers the most of the number of matching, the best results we still obtained by sharpening (RS, HP) and normalization.
Similarly, we compared the impact of different image enhancement methods on effectiveness in determining that it is most likely the same person (shown in Fig. 14).
In the classification where it can be claimed with complete certainty that the result of matching is the same person's face, the best results we obtained by applying sharpening with resizing face images of less than 150 pixels and then by normalization. The large number of comparisons in which it can be claimed with a certainty of over 90 % that they are the same persons, which includes the most correctly performed classifications with positives, is obtained by applying normalization (RN, N). Finally, we performed the data classification from all databases using the appropriate unique weight value w for a particular image enhancement method. The scores for all three measures of effectiveness (Acc, F1, and MCC) are shown in Table II. The relationship of the effect made by applied image enhancement methods with the performed normalization of the results data is shown in Fig. 15. The best score we obtained by applying normalization, first without and then with resizing. In contrast, although the unprocessed images receive a high score at the applied weight value of w, resizing gives better results than purely unprocessed images. Regardless of the applied evaluation method, the scoring ratio is uniform.

V. CONCLUSIONS
In the criminal investigation, it is imperative to identify the perpetrator, but it is equally important (if not even more) to be able to exclude (with high certainty) the possible suspects. The consequences of a wrongly accused citizen can possibly decrease support to police work and create some kind of insecurity in society. Having that in mind, our research was conducted to establish whether it is possible to develop systems for assistance in a criminal investigation regarding identity resolution in unconstrained conditions. This paper has experimentally demonstrated the influence of image enhancement on unconstrained face detection and identification. The databases we used are labeled, which made it possible to identify persons and measure the effects of image enhancement. The selection of databases simulates the actual conditions for analysing unconstrained face images by criminal investigators.
The main conclusions that can be drawn from the presented work are:  A novel method that utilizes off-the-shelf solutions (the Dlib library HOG face detectors and the ResNet faces feature vector extractor), along with different image enhancement techniques, is presented to provide practical assistance to police investigators in unconstrained face identification;  This method improves the effectiveness of the used Dlib HOG functions;  The weight value w (classification threshold value) was experimentally determined to give optimal results, which provide that the length of the uncertainty interval decreases with the application of normalization (F1 coefficient increased by 1.5 percent, MMC increased by 1.3 percent);  Experimenting on all four databases (images of different quality, size, and resolution) provides us a conclusion that resizing the image (especially with a resolution lower than 150 pixels) and sharpening should always precede the enhancement to improve face detection accuracy (the increase in accuracy is from 0.8 to 5.9 percent);  Classification accuracy greater than 99 % is comparable to the state-of-the-art solutions [32]. The advantage of the proposed method is that it can be applied (and further improved) relying on the existing (limited) hardware and software resources within the police organization;  In addition, the proposed enhancements provide images of an improved quality that makes it easier for criminal investigators to perform a visual inspection and make a decision based on the system's suggestion (prediction). Our research has shown, that even if the police cannot afford state-of-the-art solutions, the utilization of off-theshelf solutions followed by experimental research regarding the application of various image processing techniques can provide a real system that will greatly assist the police investigators.
In future work, we will analyse possible effectiveness improvements by choosing the enhancement method depending on the image's specific state. The results are expected to reduce further the uncertainty interval.

ACKNOWLEDGMENT
We thank our colleagues from the European Union's Horizon 2020 SPIRIT project consortium (Grant No. 786993) who provided insight and expertise that greatly assisted the research.

CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest.