A New Classification Approach with Deep Mask R-CNN for Synthetic Aperture Radar Image Segmentation

In this paper, a hybrid classification approach which is combined with a more deep mask regionconvolutional neural network and sparsity driven despeckling algorithm is proposed for synthetic aperture radar (SAR) image segmentation instead of the classical segmentation methods. In satellite technology, synthetic aperture radar images are strongly used for a lot of areas, such as evaluating air conditions, determining agricultural fields, climatic changes, and as a target in the military. Synthetic aperture radar images must be segmented to each meaningful point in the image for a quality segmentation process. In contrast, synthetic aperture radar images have a lot of noisy speckles and these speckles should be also reduced for a quality segmentation. Current studies show that deep learning techniques are widely used for segmentation methods. High accuracy and fast results can be obtained with deep learning techniques for image segmentation. Mask region-convolutional neural network can not only separate each meaningful field in the image, but it can also generate a high accuracy prediction for each meaningful field of synthetic aperture radar images. The study shows that smoothed SAR images can be classified as multiple regions with deep neural networks.


I. INTRODUCTION
Synthetic aperture radar (SAR) images are widely used in satellite technology. In current satellite technology, the SAR images are used for detection of a target in the military, changing air condition maps, determining the agriculture terrains. Thanks to SAR images, the desired targets can be hit with a high accuracy in the military by using unmanned air vehicles and the people can be informed of air conditions in advance with the early warning system that is based on SAR images [1]. Moreover, the agriculture terrains are periodically observed for its crop performance thanks to SAR image processing [2]. Segmentation methods play an important role in the evaluation of meaningful parts of the SAR image. In recent years, the segmentation methods could have been applied for a high accuracy rate by the development of deep learning methods and high graphic Manuscript received 22 March, 2020; accepted 3 September, 2020. processing unit (GPU) resources.
According to that SAR images have a high-quality resolution and complexity, separating to meaningful parts of SAR images is quite difficult [3]. This complexity of the SAR image is a big problem for a quality segmentation.
Smoothed and uncomplicated SAR images help more quality and high accuracy segmentation [4]. Recent studies show that the researchers focus on the SAR image despeckling. A multi-scale convolutional neural network model is proposed for SAR image semantic segmentation in [5]. The model contains to noise removal, convolutional, feature concatenation, and classification stages. As a different study, a deep learning approach that is called "Image Despeckling Convolutional Neural Network" (ID-CNN) uses a set of convolutional layers for automatically removing speckle from the input noisy images [6]. In our study, the sparsity-driven despeckling (SDD) method is used for smoothing process. The SDD method smoothens SAR image speckle noises and edges [7].
Region-based segmentation methods are useful for extracting to meaningful parts of the SAR images [8]. Instead of the whole image data are tested, the most effective method is to analyse a sample of the part of the image [9]. Convolutional Neural Network (CNN) is the one of the most used deep learning architectures for object detection and image segmentation in image processing.
As an initial step, region-based segmentation methods extract to free form regions from an image. Secondly, it describes these regions, and finally it follows to segmentation using a recognition pipeline [10]. Regionbased image segmentation is related to pixel similarity and homogeneity. Mask Region-based Convolutional Neural Network (Mask R-CNN) is one of the useful segmentation methods which has been inspired by the Faster R-CNN algorithm in recent years. It is presented that the Mask R-CNN method detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance [11]. While Mask R-CNN creates to bounding box with region proposal network (RPN) for predicted regions and performs to classification and bounding box regression by using region of interest (RoI) branch, it also adds to predicted segmentation masks in predicted regions. Mask R-CNN can also separate to same objects or meaningful fields in an image with different masking by using to instance segmentation. In our approach, we studied on a deeper Mask R-CNN framework that is based on matterport implementation for the smoothed SAR images by using to trained input weights of CNN.

A. Sparsity Driven Despeckling (SDD) Method
SAR images have a lot of speckle noises. The noisy speckles of SAR images constitute an obstacle for image segmentation. Due to the speckle noises, pointless pixel similarity is a big problem in the segmentation process. For this reason, the speckle noises should be reduced for a highquality SAR image segmentation [12].
The SAR image speckles are reduced by using SDD minimization method. In our study, Moving and Stationary Target Acquisition and Recognition (Mstar) clutter dataset is handled. It contains to 100 SAR images with different angles of the regions. All image noises of the dataset are reduced by using SDD method and the smoothed SAR images are used for a high-quality segmentation in the Mstar dataset preparing process. The new Mstar dataset is called "Mstar SDD dataset". A few samples of original and smoothed SAR images of the Mstar dataset are shown in  While various techniques, such as the Lee filter, the Lee refined filter, the Frost filter, and the Kuan filter for despeckling to the speckles, have been proposed, the new despeckling methods, such as Bayesian denoising method and Markov random field (MRF) method, have been proposed also [13]. Moreover, wavelet-based algorithms and methods are proposed with the increased complexity for reducing the speckle noises of SAR images.
Sparsity driven despeckling (SDD) method is developed for reducing to the edge and point noises of the SAR images with 0 and 1 norms by using a single parameter and less execution time [14]. In the SDD method, the SAR image despeckling optimization problem is defined in (1) The cost function J(F) is written in a matrix-vector form as follows [7] where F v and G v are the vector presentations of F and ~, G respectively. G is the observed speckled image. When x C and y C express to Toeplitz matrices, x W and Wy are diagonal matrices [14]. The matrix-vector form in (2) enables a special iterative optimization method where a linear system is solved in each step via (3): such that I is an identity matrix and n is the iteration number [14].
is used for the smooth process by using to n A weight matrix in the SDD method. This smoothed output image is used as a matrix for the convolutional input layer in our study.

B. Mask R-CNN
Mask R-CNN is a deep neural network that is based on R-CNN, Fast R-CNN, and Faster R-CNN algorithms, for instance, segmentation [15], [16].
R-CNN generates to independent regions proposal by using the selective search algorithm. Each region proposal is given to CNN so that it generates to features as a feature extractor from each region (Fig. 2). After passing through CNN, R-CNN extracts a feature vector for each region proposal, and finally support vector machine (SVM) is applied for classifying to the desired region with extracted features from CNN [17]. Fast R-CNN uses a single deep CNN to extract features for the entire image once unlike R-CNN. The whole proposal regions are sent to CNN architecture and it runs for all proposal regions [18]. Every proposal region works on CNN architecture by using a technique that is called "Region of Interest (ROI) Pooling". The last CNN is used in Faster R-CNN for Region Proposal Network (RPN) that depends on the calculated features of the image instead of using the selective search algorithm CNN for Region Proposal Network (RPN) that depends on the calculated features of the image instead of using the selective search algorithm [19].
The most prominent difference between Fast R-CNN and Faster R-CNN is RPNs. According to the usage of last convolutional layers, RPNs reduce the computational requirements of the overall inference process and they decide where the meaningful fields are in the image. The RPN quickly and efficiently scans every location to assess whether further processing needs to be evaluated in a given region. The proposal anchor boxes are the bounding boxes that are predicted for each meaningful parts with different square sizes. In this way, Faster R-CNN decreases to calculation time and gives faster results.

III. PROPOSED METHOD
In the proposed solution, the SDD method is combined with the Mstar database for the initial hybrid study. When the Mstar SAR images are investigated, most of images have a big complexity. For instance, when the forests and their shadows are compared with the roads in SAR images, these regions have a big pixel similarities and the whole input images are smoothed with SDD method. In this way, the Mstar SDD dataset images that are smoothed and of reduced complexities are created one by one. This dataset is converted to numpy arrays for the CNN input layers by using to Mask R-CNN algorithm. The Mstar SDD database images will be given as an input matrix of the region-based convolutional input layer.
Secondly, the five class identifiers are defined for the intelligent classification process. The SAR image regions are defined as the most observed five classes that are forest, building, road, tree(s), and terrain. The undefined region of the SAR images is defined as background.
Thirdly, Mask R-CNN algorithm is used for detecting regions of the SAR images. Instead of the RoI Pooling method, the RoI align method is used in Mask R-CNN [11]. When the RoI pooling method uses feature map bounds of quantized integer coordinates that cause the more segmentation losses, the RoI align method uses feature map bounds of non-quantized floating coordinates that help to fewer segmentation losses. Regular RoI pooling changes the topology of the features and it causes a misalignment between feature map outputs and RoIs. Using regular RoI pooling would negatively impact the ability to predict pixelaccurate masks. Bilinear interpolation is used for estimating the exact values of the input features at four fixed locations in each RoI bin. In this way, the results only aggregate then for per bin. The loss function output of Mask R-CNN is expressed in (5)

A. Dataset
In our study, the Mstar public clutter dataset is handled for SAR image segmentation. It contains 100 SAR images with different locations and angles [20]. The Mstar dataset is very useful for region-based image segmentation thanks to its low-resolution and size. 80 % of SAR images of Mstar clutter dataset are randomly divided for the training process, and 20 % of SAR images are randomly divided for the validation process. SAR images are smoothed by using SDD algorithm in related work. Thanks to the SDD algorithm, the masking edges can be detected and drawn more clearly. After the smooth process, visual geometry group (VGG) image annotator is used for determining the regions and its polygonal edges [21]. It is a simple and useful annotation software that can generate a single JavaScript Object Notation (json) file. The json file holds to coordinates of the polygonal regions and information of these regions. These defined regions are sent to the convolutional neural networks as an input neuron.
When the Mstar clutter images are investigated, the forest, road, tree or trees, terrain, and building regions of the SAR images are mostly observed for an intelligent classification. The five identifier classes are created based on Mask R-CNN matterport implementation in our study [22].

B. Classification
Current studies show that the Mask R-CNN algorithm contains useful region classification predictions, masking, and a bounding box regression. However, the Mask R-CNN can predict a single object or region detection of an input image, it can also predict to multi-detection of different or same objects. Thanks to instance segmentation, the same regions of the input image are masked with different masking.
While the Mask R-CNN matterport implementation is executed, the pre-trained coco weights are used for the initial weights. The pre-trained coco weights help to start initial weights that are required for the input layer of the region-based convolutional neural network. Moreover, the segmentation processes are observed with different pretrained models, such as vgg16, resnet50, resnet101, and inceptionv3.

C. Deep Mask R-CNN
In our study, a SAR classification algorithm that is based on Mask R-CNN implementation is created with pyhton. Initially, the coco model weights are used for the input layer of CNN.
SAR classification algorithm is basically in the following steps: Initial step: Step 1. Prepare to Mstar SDD database.
Step 2. Create to dataset directory system.
Step 3. Determine to desired regions of SAR images with VGG image annotator. Training step: Step 4. Determine to classification regions (forest, building, road, tree(s), terrain).
Step 5. (If it is initial step) set to default backbone networks weights for training process (initial weightscoco or other backbones).
(If the algorithm is executed one time) set the last obtained keras weights to algorithm.
Step 6. Train to algorithm until total epoch. Evaluation step: Step 7. During the training, observe to losses.
Step 8. Generate to SAR classification weights for each iteration. If the algorithm loss, masked loss, and bounding box loss are not sufficiently reduced, go to step 5. If it is enough, go to step 9.
Step 9. Display to segmented and masked SAR image.

V. EXPERIMENTS
We studied on python virtual environment based on Mask R-CNN matterport implementation. In our experiment, CPU: Intel (R) Core (TM) 4 cores, memory: 16 GB, GPU: i7-2600 NVIDIA GeForce GTX1070 Ti 8 GB GDDR5 hardware configuration is used, and the experiment is built with python 3.6, tensorflow 1.13, keras 2.1 in virtual environment. All experiments are applied with 60 total epochs and 500 per epoch. The running time is approximately 4 hours and 30 minutes for each training process.
In addition, when vgg16, inception v3, resnet50, resnet101, and coco backbone networks, which is combined Deep Mask R-CNN, are compared to each other, Deep Mask R-CNN is trained with high accuracies and low loss rates in resnet50 and coco models. Moreover, the bounding box and predictions are also obtained with high accuracy. Deep Mask R-CNN performance results are shown in Table  I, and the segmentation results for the five different backbone models are shown in Fig. 4. When the trained weights that are obtained from Mask R-CNN 1 are applied as initial weights for Mask R-CNN 2, the high accuracy and low loss results are obtained. The final accuracy (accuracy 2) and loss rates (loss 2) are also shown in Table I.  According to experimental results, when the Mask R-CNN framework is applied deeper as the proposed method, it is observed that the class, masking, and bounding box losses are reduced by Deep Mask R-CNN.

VI. CONCLUSIONS
In this study, a new hybrid classification method for SAR image segmentation is proposed. According that the SAR images have a noise complexity, the smoothed SAR images provide convenience for a quality segmentation. While the SDD method provides to reduction of the speckle noises in the SAR image by using the mixed norm of L 0 and L 1 with a single parameter for fast computing, Deep Mask R-CNN provides a high-quality segmentation with multi-region predictions.
While most of the current studies focus on detecting a single region of the SAR image segmentation, multi-region detection of SAR image segmentation is trained. Multiregion segmentation can be applied for the complexity images with robust segmentation features of the Mask R-CNN [23].
The SDD method for the input layers of CNN is used in the proposed method and SAR images are classified with multi-region segmentation via Deep Mask R-CNN. The study shows that no matter which backbone network is trained for SAR image segmentation, the algorithm losses of the Deep Mask R-CNN are reduced and the accuracy of the algorithm increases.
VII. FUTURE STUDIES SAR images are of vital importance for the satellite technology. It is especially used for target detection in military and air condition maps for early warning systems. Our study focuses on five regions (forest, building, tree, terrain, and road) of the 100 SAR images based on Mstar dataset as a sample study. These regions can also be extended for more multi-regions, such as lakes, oceans, seas, city centers, mountains, etc. Moreover, more successfully segmentation and masking can be obtained by training more different SAR images that are smoothed with SDD method. Multiple segmentation can also be applied in less running time and loss rates with Deep Mask R-CNN.

CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest.