Recognition of Road Type and Quality for Advanced Driver Assistance Systems with Deep Learning

To develop effective advanced driving assistance systems, it is important to accurately recognize current driving environments and make critical decisions about driving processes. Preventing accidents through the interaction between the driving assistance systems and the environment and ensuring optimum driving dynamics are the main topics in this field. Vehicles need to recognize the road type and quality at a high accuracy to ensure the most suitable driving for the road type. It is also important to use both uncomplicated and cost-effective systems when performing this detection. In this study, a deep learning-based approach that can be used in vehicle driver assistance systems is proposed to automatically recognize road type and quality. Using this approach, it is possible to determine the road type and the quality of the road using only driving images as the input data. A new convolutional neural network model is designed for classification of the driving images. Driving images obtained from Google Street View are used to evaluate the recognition system for an actual driving environment. The proposed approach shows that the road types were determined with accuracy of 91.41 %, and the pothole road–smooth road distinction was successful at 91.07 %. It can be said that the proposed method is an effective structure that can be used for advanced driving support systems, V2I communications systems, and similar intelligent transportation systems. DOI:  http://dx.doi.org/10.5755/j01.eie.24.6.22293

implement an early warning system or to intervene directly in the driving policy.ADAS needs to classify the road types and detection disturbances with high performance to provide these operations.
Modern cloud technologies have been developed and the widespread use of smart cities is being investigated in the field of vehicle-vehicle (V2V) and vehicle-infrastructure (V2I) communication systems.Intelligent Transportatıon Systems (ITS) have been developed with new studies on these technologies.For ADAS investigated under intelligent traffic methods, sensor technologies, and snapshot retrievalprocessing techniques are used to detect the status of the road, its objects, the status of the driver, and the like [9].
It is considered a difficult task to establish road classification and make decisions for support system vehicles according to these road types.At present, many methods have been developed in the classification of images.Deep learning is at the forefront of the modern methods used for this purpose [10], [11].Deep learning occurs with the use of advanced technology, such as multilayer neural networks, to create systems that can identify properties with feature extraction methods from unlabeled training data, especially in large quantities.
Deep learning architecture analyses the main features, such as lines, edges, color, texture, etc. at the first stage as input data.It combines the results of this process and creates object modeling [12].The most commonly used structures of deep learning are the Deep Neural Networks (DNN), Automatic Encoders (AE), and Restricted Boltzmann Machines (RBM), which are variants of Boltzmann machines [13].Recently, in the field of deep learning, the convolutional neural network (CNN) method, which has a high accuracy performance ratio, separates classes or features by retrieving the attributes of the data, and processes the whole image, has been used.
In this study, a deep learning-based road type and quality detection system that can be used for ADAS or vehicles to vehicles (V2V) and vehicle to infrastructures (V2I) systems is proposed.It is an effective system in terms of both cost and usage thanks to recognition of road type and quality by using only camera images.The remainder of the article is as follows.In Chapter II, the general literature on the study of the article are discussed.Chapter III presents the proposed structure of the deep learning-based recognition system.Chapter IV contains the experimental results for classification studies of both road type and quality.Section 5 discusses the results of the study.

II. RELATED WORKS
In the related literature, there are generally studies on road detection and lane detection.There have also been studies on the classification of road quality, which is an application of this work.The first studies on road inspections were the work of ALVINN (an Artificial Neural Network based Autonomous Land Vehicle) in 1989 [14].In this study, a three-layer back propagation artificial neural network model designed for road monitoring was used.In the ALVINN system, images obtained by a camera and a laser distance finder were taken as inputs, and the output was used to produce the direction in which the vehicle should move.The training of the system was realized using simulated road views.
Studies on road type detection are limited, however.Deep learning and artificial neural network methods have been used to classify the road type [15]- [18].The first studies on the classification and interpretation of highways have attempted to classify road types using images obtained from devices in the Controller Area Network (CAN) buses in the UK by Taylor et al. [19].Whether or not the roads were marked along with any colored signs (white, blue, and green) were examined under four classes.The researchers used the J48 algorithm from decision tree methods and obtained different performance ratios according to road types.
Tang and Breckon (2011) [20] performed a study between 2007-2008 with data from Cranfield University.They classified the images according to the road environment by using the k-nearest neighbor (k-NN) algorithm.In this study, the classification process was applied according to roadside and environment, not by road.In the two-class (not road-road) performance, the success rate at the 200th epoch was 97.00 % with different models and k parameters.In the case of the four classes (not road, city center, main road, and expressway) performance, the maximum was 86 %.
Danti et al. [21] suggested using the k-means clustering method on the pothole images obtained using the segmentation method in the road images.The classification performance of that method was 70 %.
In another study [22], a simulation system was developed by integrating V2V and V2I communication modules with the detection of intersections on the road in order to improve traffic flow and reduce traffic accidents.By using different algorithms for the simulation environment, the system has reduced the traffic accidents.Slavkovikj et al. [23] produced 20,000 texture images from road images for paved and unpaved road types that were obtained from Google Street View.They classified the dataset with a two-parameter Support Vector Machine (SVM) algorithm (paved, unpaved), and the maximum accuracy of this study was 85.30 %.
Seeger et al. [24] used SVM and CNN to classify the road type by providing several approaches to driving and recognizing the vehicle on a motorway, on the highway, in a parking area, or an urban road.They used 700 images for each class, 150 images for testing, and the remaining for training.In addition to this image dataset, Lidar and longand mid-range radar systems, to which they added the intermediate dataset, have been added and categorized by extracting the path and object map of the region.
In another road detection study that was proposed by Kong et al. [25], the Vanishing Point (VP) method with Gabor filters was used for determining the road end points.Also, some edge detection algorithms were used for extracting the road bounds.The road detection process was performed using various combinations of these features.

III. THE PROPOSED ROAD TYPE AND QUALITY RECOGNITION SYSTEM
In this study, a new CNN model was developed, which is one of the deep learning methods for both road type and disturbance detection.We briefly describe the developed architecture as Road Type and Quality Recognıtion System (RTQ-CNN).Images from Google Street View were used to detect real-world driving environments in the study.Since these images were obtained with a camera placed on the vehicle, it is quite suitable for this work's input data.The steps of the proposed RTQ-CNN system are shown in Fig. 1.The obtained raw camera image contains a significant amount of data about the road.A pre-process was developed for the images to obtain only the road data.For this purpose, the VP method [26] was applied to the input images to determine the cut-off points.Then, cropping was performed on the images with reference to these cut-off points.

A. Road Image Data Set
In this study, real driving views were used to classify the road type and road quality using real-world images.For this purpose, Google Street View [27] images usually consisting of RGB channel images were taken in every environment and in different weather conditions.Since images taken on the Google Street View system do not have a certain angle, the images were downloaded after the best angle was set and a database was created from these data.In Fig. 2, we have shared the data set that we created.This information can be accessed at https://github.com/vedattumen/RT-Q-CNNwith online open access.Table I shows the data distributions in the dataset used.In total, 640 images for the dataset were used to determine the road type in the first experimental study.For the second experimental study, a new dataset was created by taking 138 all road type and 139 distorted road images.The 640 images were used for each road type classification application, and 277 images were used for road quality detection.
As can be seen in Fig. 2

B. Pre-Processing
The VP method was used in the preprocessing work on the image data obtained from the Google Street View application.Kong [25], [26] proposed a confidenceweighted Gabor filtering technique that computes the dominant texture orientation at each pixel.Rasmussen [28], [29] proposed a VP method by applying the Hough transform method in addition to the Gauss filtering technique.In consideration of these methods, a new adaptive road edge and road vanishing point detection system based on a lost-point and restricted edge detection technique was prepared by adapting them to our data.The CNN model prepared in this study was carried out on both the original data and the images obtained after the VP filter.Figure 3 shows the pre-processing steps performed on the driving images.The VP of all the images in the dataset was determined according to the method in Fig. 3, and a new dataset was prepared and then presented as input data to the designed CNN model.

C. CNN Models
Convolutional networks were used to distinguish the input images from the RTQ-CNN model.Feature extraction methods are complex and time consuming.Another alternative approach to using the feature extraction process step is to use an end-to-end procedure that performs the classification operation directly using pure image pixels.Recently, multi-layered deep learning constructs have begun to be used for automatic recognition of image data.The main advantage of these structures is the hierarchical acquisition of features at various levels of the image.In the first layers, low-level features such as corners are obtained, and higher levels of properties such as more complex shapes are detected in subsequent layers.Large-scale projects working in the field of image processing have been mostly focused on CNN-based work and its variations.Projects such as LeNet [30], AlexNet [31], GoogLeNet [32], VGGNet [33], and ResNet [34] have achieved great achievements in image processing and classification by developing applications based on CNN.CNN, which was proposed by Yann LeCun in 1990 and is increasingly used today, is a special kind of advanced propagation algorithm of multi-layer neural networks [35].CNN was developed to process pixel images with the lowest amount of preprocessing and to extract direct visual patterns.One of the reasons CNNs perform better than other classification algorithms is that they have attribute-increasing and summarization layers.With the help of these features, CNNs better classify and recognize the available data.A standard deep CNN architecture comprises five main layers.These layers are convolution layer, activation layer, sub-sampling or pooling layer, fully connected layer, and SoftMax layer.

1) Convolutional Layer:
This layer is basically composed of filters and feature maps.Filters represent the neurons of the layer.Random numbers are generated to produce initial weights.With these weights, the output values are obtained by convoluting the values indicated in the image.These operations proceed step by step at regular intervals and continue until all regions of the images entering the system are scanned.

2) Pooling Layer:
After using the convolution layer, it is desired to extract the deeper features by passing the acquired properties through different processes.The sub-sampling layer is a layer used for this purpose, and the input data is used to reduce lower dimensional representations.This process uses various techniques.For example, maximum and mean techniques are commonly used sub-sampling techniques.

3) Activation Layer:
The activation process is achieved by following a linear filter with a non-linear activation function applied to each component of a feature map.The most common method used in CNN models is ReLU, known as zero-thresholding.This method produces faster results when compared to the sigmoid and tangent functions.This allows for a significant speed increase, especially for computers with limited processing capacity [34], [36].Equation (1) shows the ReLU activation function 0; 0, () ; 0.
In addition to the ReLU method, there are many activation functions.Some of those include the following: Maximum and average activation functions: ( ) (0, ).
Sigmoid activation function Hyperbolic tangent activation function

4) Fully Connected Layer:
Convolution layers, activation layers, and pooling layers connect neurons to the neurons of the images that they draw and merge into one dimension.The neurons are matched according to the cross value given in this layer.Generally, there is no convolution layer in this layer, and the merge of the neutrons formed in this layer is done to the classification layer.

5) SoftMax Layer:
This is a successful classification method for the classification of multi-class data.It is especially used for high numbers of class data.Even though the number of classes in a normal classifier is very low, SoftMax can classify over

IV. EXPERIMENTAL RESULTS
In this article, two different experimental studies were performed.The first was an automatic classification of the road type.In this experimental study, image data of five classes ( Class-A-B-C-D-E) were used.The second experimental study was to determine the road quality.In the second experimental study, a Class-F data set containing distorted road images and a mixture of Class-A-B-C-D-E non-distorted images was used.Furthermore, under each experimental study, two different datasets were used.The first of these includes images without any preprocessing, and the dimensions of the images in the dataset were resized as 128 × 128, passed through the model, and their performance was evaluated.In the second dataset, the images were passed through a designed filter using the VP method, and only the horizontal side of the vanishing point was cropped according to the optimum vanishing point.A single CNN model was used in all experimental studies.This CNN model is a new model designed for this work and is called RTQ-CNN, as mentioned previously.This model was applied to the dataset, and the classification performance was evaluated.A detailed block representation of the parameters of the designed CNN model is shown in Fig. 4.
The designed CNN model has 7 convolutions, 7 pooling, and 7 ReLU layers in total.Our model consists of two blocks that repeat three times in succession of convolution, pooling, and ReLU layers.The last block had the number of kernels as 128 for the first convolution layer, and 32 kernels of 3 × 3 dimensions were processed with 2 strides on the image data.The max pooling layer on which the maximum values were taken was on the 2 × 2 regions on the feature maps obtained from this layer and after the ReLU layer from which the activation map was derived.All layers in the block continue in a similar way, with the latest fully connected and SoftMax function layers achieving class scores.In addition to the designed CNN model in the second application, the results obtained by the VP method on the input layer and immediately following the kernel size of the first convolution layer were changed to 16.

A. Experiment 1: Road Type Classifications
In this experimental study, a recognition study was made with a database consisting of 640 images under five classes to be used, with 80 % of the images for training and 20 % for each class.For this purpose, images with the dimensions of 128 × 128 × 3 (width × height × color) were used for the entries of the five road types.During the training of the methods, 512 images and 128 images in the test phase for 5 classes were used.The performance graphs of the learning processes of the RTQ-CNN model are shown in Fig. 5.When Fig. 5 is examined, it can be seen that the pre-processing data using the VP method yields a higher value of recognition performance.In the study, LeNet_5 and Vgg_5, Vgg_6, Vgg_8, and CNN models, widely known in the literature, were applied on the same data sets.Despite many trials and weight adjustments for these models, the performance was observed to be lower than our model.In Table II, the classification achieved by the CNN models for the data set are given.Table II shows that the highest achievement on the preprocessed dataset obtained by both the raw dataset and the VP method was the RTQ-CNN model that we are recommending.For the 91.41 % success achieved by the VP method, the confusion matrix is seen in Fig. 6.

B. Experiment 2: Road Quality Classification
To gain better recognition of potholes and cracks on the surface of roads, we changed to a 16-batch size value, which is the input parameter of our model, and reduced the kernel size of the first convolution layer to 16.These changes resulted in a higher level of detection of road disorders.
In this study, a dataset of 277 images in total consisting of two classes was created.Eighty percent of the data in this dataset was used for training and 20 % for testing.For the recognition works carried out on the dataset, five road type samples were used, which included bad-road images.When we look at the results of the study for road quality detection in Fig. 7(a), it is seen that the performance of the model achieved 87.50 %, and that the training phase was completed at the 14th epoch.When Fig. 7(b) is examined, it is seen that our model worked better after the detection of images with the VP method, and the performance was 91.07 % in the 25th epoch.In this experiment, it was determined that there are potholes and cracks in the road images in the dataset.Figure 8 shows some of the images found in this dataset.
The RTQ-CNN model, which we have prepared, was shown to perform with a high performance in road images compared to CNN models in the literature and other classification methods.After passing through the convolution filter of the images, the ReLU activation function with the correct parameters detected hidden patterns in the images.When the number of samples was increased by the high-density parameter at the fully connected layer with the detected parameters, high output results of the application were determined.Cracks and potholes in the road were recognized 91.07 % of the time using the developed method.The performance ratios of all the studies are shown in Table III.Table III gives comparisons of all experimental results obtained from the study.When Table III is examined, it can be seen that the proposed model gives high results both in the recognition of the road type and in the quality.The accuracy rate with VPbased pre-processing showed a significant increase.According to these results, our proposed model performed better than the CNN models in the literature.The performance rate in the road type and quality classifications were quite high compared to all other models.This indicates that the RTQ-CNN model is quite successful in determining the road type and quality.

V. CONCLUSIONS
In this study, a deep learning-based recognition system was proposed to detect road type and quality for advanced driving assistance systems.A new CNN model named RTQ-CNN was developed for these purposes.Additionally, the VP method was used for the pre-processing stage.Performance evaluations for the developed method were performed using road images collected from Google Street View.The image dataset includes six different road types that were obtained from real driving environments.The experimental studies consisted of two different applications: road type and quality detection.In the first application, a total of 640 images and five class data sets were used for the classification of road type.The RTQ-CNN model achieved an 82.03 % success rate without using the VP method.When the VP pre-processing method was applied to the same data, the performance increased to 91.41 %.In the second experimental study, a performance evaluation was carried out for the determination of road quality.A total of 221 road images were used for two classes: distorted and normal road images.The RTQ-CNN model achieved success rates of 87.50 % and 91.07 % with and without the VP, respectively.
In light of the obtained results, it can be said that the RTQ-CNN model can provide a basis for the development of early warning systems by fast and real-time processing of snapshots obtained for advanced driving support systems after sufficient training and parameter tuning.The proposed system has features that can be used in important work areas such as autonomous vehicles and semi-driverless vehicles.The system can be developed for implementation in real vehicles for detecting road type and quality.

VI. FUTURE WORKS
Progressive work will test the performance of the model while in motion in an actual driving environment.With this model, it is predicted that the type of road and the quality of the road can be easily determined for the vehicle with the image processing performed while the vehicle is in motion.However, video images may be used for future work.The aim is to increase the performance ratio in the future by improving the model.More important than all of these works is the work to be done on development-oriented driving support systems that will provide control of the vehicle dynamics according to any road information that is obtained instantaneously.

Fig. 1 .
Fig. 1.Steps of the proposed RTQ-CNN-based road recognition system.The raw road input images are pre-processed through the detection of VPs.After this pre-processing, these images are presented to the CNN models for the classification process.

Fig. 2 .
Fig. 2. Road image examples in the dataset used, as collected from Google Street View.

Fig. 3 .
Fig. 3. Vanishing point detection and road image cropping: (a) raw driving images, (b) detection of points after passing through the VP filter, (c) optimum VP selection and horizontal cutting, (d) obtaining pre-processed images.

Fig. 4 .
Fig. 4. Designed new deep CNN model for road type and quality recognition system.

Fig. 5 .
The results of road type classification according to the RTQ-CNN model: (a) non-preprocessing images, (b) preprocessing images.As shown in Fig. 5(a), when the training phase of the RTQ-CNN model was completed on the 11th epoch, the validation phase achieved the highest success in the 12th epoch.The 12th epoch performance was measured at 82.03 %.In addition, as shown in Fig. 5(b), the training phase for the input images with the vanishing point method was completed at the 29th epoch, 91.41 % test success at the 24th epoch, and fixed at 91.41 % after the 50th epoch.The RTQ-CNN model took about 9 seconds for each epoch on a computer with an i7 CPU and 1 second on a Tesla K80 GPU.

Figure 6
Figure 6 correctly identifies 1 of 25 images in C1 as a C3class Paved Street Road and 23 correctly.Only 1 of the 25 C2 images with Express Road images were incorrectly classified as a C1 class with Divide Roads, and the performance in this group was 96.0 %.In the Parquet Street Road class C5, 28 of the 30 images were classified correctly, while two were incorrectly classified.The mean performance of the system in the recognition of the road type was determined to be 91.41 %.

7 .
For each class, images are used in dimensions of 128 × 128 × 3 (width × height × color).During the training period, 221 images (111 images for pothole-defect roads and 110 normal road samples) were used, and during the testing phase 56 images (2 classes × 28 samples) of the methodswere used.Performance graphs of the learning processes of the RTQ-CNN model are shown in Fig. 7.The results of road disorder classification according to the RTQ-CNN model: (a) non-preprocessing images, (b) preprocessing images.

TABLE I .
CLASSIFICATION DATASET.

TABLE III .
COMPARISON OF PERFORMANCES KNOWN CNN MODELS AND RTQ-CNN MODEL.