An Application of Artificial Neural Networks for Prediction and Comparison with Statistical Methods

—The unemployment rate is a measure of the prevalence of unemployment and can be a good indication of the for country’s economic situation. It is thought that it might change depending upon educational situations of the people. This study seeks how to model and predict the unemployment rates with respect to educational situations of the people in Turkey. For this purpose, An Artificial Neural Networks model was proposed for prediction long-term prediction (up to year 2019) is performed by using Artificial Neural Networks, Box-Jenkins Method and Regression Analysis Method. Then these methods have been compared with each other; and the study concludes that Artificial Neural Networks is more appropriate and consistent than Box-Jenkins Method or Regression Analysis Method for the prediction.


I. INTRODUCTION
Unemployment and recruitment are the key indicators of economic development level for national economies.In many countries including Turkey, unemployment is considered as a major source of social problems and unemployment problem must be solved immediately.A productive economy and greater competitiveness are referred to as the number one condition for becoming a developed country.Countries suffering from unemployment must improve human resources for economic development [1].Prediction of the unemployment rates is very important utility for the countries.
Statistical methods such as Box-Jenkins Method (BJM) [2] and Regression Analysis Method (RAM) can be used for the prediction.However, according to previous researches on prediction and modelling of non-linear systems, BJM and RAM are discovered to be insufficient [3].Artificial intelligence techniques are convenient to overcome the disadvantages of BJM and RAM [4], [5] and viewed as the most promising methods towards intelligent systems [6].Artificial Neural Networks (ANN) is one of those approaches implemented via computer technology in all Manuscript received February 13, 2012; accepted June 27, 2012.fields; and this is actually preferred to other methods.ANN is used for the prediction in various applications.Seyhan et al. [7] employed ANN model to predict the effects of the thermoplastic binder concentration.Karunasinghea and Liong [8] made chaotic time series prediction with artificial neural networks.Bezerra et al. [9] predicted kinetic parameters of carbon reinforced fiber composites by using ANN.Karazi et al. [10] used ANN approach for the prediction of laser-machined micro-channel dimensions.Brown and Moshiri [11] applied two ANN models, a backpropagation model and a generalized regression neural network model to estimate and forecast unemployment rates.Ioana and Atsalakis [12] presented an ANN and fuzzy inference system to forecast of Greek unemployment rate.Wang and Zheng [13] used back-propagation neural network to predict unemployment rate of China.
This study investigates how unemployment rate changes with respect to the people's educational situations by using artificial neural networks.To achieve the projected end, data sets, between the years 1992 and 2008 were taken from Turkish Statistical Institute [14].Then an ANN model was proposed for prediction and BJM, RAM and ANN methods were respectively applied for the prediction of unemployment rates with respect to educational situations of the people.

II. ARTIFICIAL NEURAL NETWORKS
ANN is designed in a similar format to that of human brain; they are systems consisting of a number of processing units and operating parallel.The main principle of ANN is based on finding coefficients between the inputs and outputs of a problem, making connections between input and output layers and doing all jobs on a learning system [15].The ANN has the ability to establish the relationship between input-output data that identified depending on the parameters of a system [16].
One of the advantages of ANN, it does not need detailed information about the system.ANN can also define the complicated and complex relations inside the system.Recently, ANN has been applied to a number of different problems that involves transactions on classifying, predicting, control systems, optimization and decision making [1], [17].Another advantage of ANN is that it is independent of statistical distribution of the data [18].ANN basically consists of three layers as shown in Fig. 1.The first layer consists of processing units related to input variables (cell or neuron); this section is called input layer.The function of input layer is to transmit the input variables to hidden layer coming after itself in the network.The final layer consists of output variables called the output layer.The layer consisting of the processing units between the input layer and the output layer is called the hidden layer.The presence of the hidden layer is useful in the modelling of the complex relations.
The number of the neurons in the layers varies depending on the complexity of the problem.Links coming out of each neuron of the input layer have weights; and the weights connecting z h hidden layer with x j input layer are labelled as w ij .Each hidden layer neuron calculates the weighted sum of the neurons in the input layer; and this is given in (1) .
Outputs corresponding to these layers are obtained as a result of the implementation of the function known as activation or transfer function to inputs.An ANN with randomly set weights is "dull" but can be trained by successive repetitions of the same problem.A network "learns" by iteratively correcting the weights (the only adjustable parameters) so as to produce the previously specified output values (target sets) for as many input sets as possible.
If the difference between these values designed as output and actual values is at the desired level, the algorithm is terminated; otherwise, the weights are updated in such a way that the default between these values is minimized.This algorithm is called back propagation; and this is the most commonly used learning algorithm in ANN [15], [17], [19].
A commonly used equation for calculating the error of a network is given in (2) where t j and output j are the actual and desired values of unit j in the output layer.The weights are updated according to the so-called delta-rule of learning , where η > 0 is the learning rate, i δ is a correction term, and o j is the output of unit i in the previous layer.The value of i δ should be proportional to the output-error.In the back- propagation algorithm the correction-term is obtained by applying the so-called gradient descent method, which leads to the following expression for the delta-term of an output unit ).
The following recursive formula is applied to calculate the correction term for a hidden unit (1 ) .
It has been found that the performance and also the stability of a training process are greatly enhanced if a socalled momentum term is added to the learning rule , where 0< µ < 1 is a constant called momentum, and is the adjustment to the same weight in the previous iteration cycle [15], [17], [19].

III. PREDICTION OF THE INVESTIGATED DATA
In this study, the rates of unemployment during the period of 2002-2008 are analyzed in connection with the people's educational situations.To this end, in order to predict the rates of unemployment ANN, BJM and RAM are used and the results are compared.For each method, the data set for the period from 1992 to 2002 is used as training set and the data set for the period from 2002 to 2008 is used as test set.For the analysis, MATLAB, EVIEWS and S-PLUS software programs are used.
For each educational situation, different models are used and each model is formed as follows: 1) Model 1.The year is chosen as the input variable; and the output variable is unemployment rate for the primary school graduates considered.
2) Model 2. The year is chosen as the input variable; and the output variable is unemployment rate for the secondary (high school) graduates.
3) Model 3. The year is chosen as the input variable; and the output variable is unemployment rate for the university graduates.

A. Results for ANN
The structure of the ANN proposed for the models is as follows: The neuron number of the input layer is 1, the number of hidden layer is 1, for model 1 and 2 the number of the neuron is 2 and for model 3, and it is 3 whereas for all the models the number of output layer neuron is 1.
Activation function used between the input layer and hidden layer is hyperbolic tangent and the activation function used between hidden layer and output layer is a linear function as shown in Fig. 2. Learning rate and momentum rate are taken as 0.5; and as a learning algorithm Batch Gradient Descent algorithm is used.This operation is completed in 1000 iteration steps.Estimated values obtained through ANN method for test data set (2005)(2006)(2007)(2008) for Model 1-3 are shown in Table I.Finally, long-term estimation of the unemployment rates in line with the levels of education over the period between 2009 and 2019 is calculated; and these results are shown in Table IV and presented graphically in Fig. 3. It's seen that the rates are fluctuating for Model 2-3 and stable for Model 1.

B. Results for RAM
Regression equations obtained for Model 1-3 in regression analysis technique respectively are given in ( 7)-( 9): The estimated values obtained for regression analysis for 2005-2008 are given in Table V.The results of long-term estimation till 2019 are as seen in Table VII and presented graphically in Fig. 4. It's seen that the rates are in declining trend for each model.C. Results for BJM BJM used for Model 1-3 out of the implementation of the relevant method are as follows: When the first differences are taken for Model 1, it is concluded that the series tends to be stable and when its correlogram is drawn, there is only a significant observation in Autocorrelation Function graph (ACF), as shown in Fig. 5 and no significant observation in sinusoidal decreasing Partial Autocorrelation Function graph (PACF) as shown in Fig. 6.As the most suitable model for this, AR(0,1,1) and AR(0,2,1) models are considered the best model that fits.Since MAD is observed to be the smallest in AR(0,2,1) model, this model is used.
The initial differences are taken for Model 2; and it is observed that the series becomes stable.Moreover, based on its correlogram, it is concluded that there is one significant observation in ACF graph, and no significant observation in sinusoidal decreasing PACF graph.Therefore, AR(0,1,1) and AR(0,2,1) models are considered the best model fit.Since MAD is observed to be the smallest in AR(0,1,1) model, this model is used.After the first differences are taken for Model 3, it is observed that the series becomes stable.Based on its correlogram, it is concluded that there is one significant observation in ACF graph and no significant observation in sinusoidal decreasing PACF graph.As the most suitable model for this, AR(0,1,1) and AR(0,2,1) models are considered the best model fit.Since MAD is observed to be the smallest in AR(0,

D. Comparison of the methods
The minimum mean absolute error values for each method are given in Table XI and presented graphically in Fig. 8.According to these results, it's seen that ANN has better error values than BJM and RAM for each model.Estimated values of ANN are closest to actual values.Finally it's shown that prediction by using ANN is more appropriate than other methods.

IV. CONCLUSIONS
This study investigates development of artificial neural networks for prediction and how unemployment rate changes with respect to the people's educational situations.In longterm prediction for Model 1 and 3, unemployment rates will be stationary but they will rise for Model 2 in future.For test data set, estimated values of ANN are closest to the actual values for Model 1-3.It is shown that ANN is quite robust since it can recognize the complex relations from among great number of variables better than other conventional statistical methods: BJM and RAM.
Disadvantages of BJM and RAM include with stationarity, necessity of providing some assumptions and not getting consistent results in which data sets are fairly small.On the contrary, ANN can work with small data set; and ANN is actually preferred to other methods because it has significant and specific attributes such as generalization, learning from data, working with unlimited number of variables and no necessity to pre-information about the problem.

Fig. 7 .
It's seen that the rates are stable for Model 2-3 and declining for Model 1.

TABLE I .
ACTUAL AND ESTIMATED VALUES FOR MODEL 1-3.

TABLE II .
AD AND MAD VALUES.

TABLE IV .
THE RESULTS OF LONG-TERM ESTIMATION FOR 2009-2019.

TABLE V .
ACTUAL AND ESTIMATED VALUES FOR MODEL 1-3.

TABLE VI .
AD AND MAD VALUES

TABLE VII .
THE RESULTS OF LONG-TERM ESTIMATION FOR 2009-2019.
1,1) model, this model is used.The estimated values obtained by means of this technique for 2005-2008 are presented in Table VIII.

TABLE VIII .
ACTUAL AND ESTIMATED VALUES FOR MODEL 1-3.

Table IX .
Finally, long-term estimation results for 2009-2019 are presented in Table X and presented graphically in

TABLE IX .
AD AND MAD VALUES.

TABLE X .
THE RESULTS OF LONG-TERM ESTIMATION FOR 2009-2019.

TABLE XI .
MAD VALUES FOR THREE METHODS.Comparison of the methods for Model 1-3.