A Novel Information Fusion Method for Vision Perception and Location of Intelligent Industrial Robots

1 Abstract—An improved SURF (Speeded-Up Robust Feature) algorithm is proposed to deal with the time-consuming and low precision of positioning of industrial robot. Hessian matrix determinant is used to extract feature points from the target image and a multi-scale spatial pyramid is constructed. The location and scale value of feature points are determined by neighbourhood non-maximum suppression method. The direction of feature points is defined as directional feature descriptors by the binary robust independent elementary feature (BRIEF). The progressive sample consensus (PROSAC) is used to carry out second precise matching and remove mismatching points based on the Hamming distance. Then, an affine transformation model is established to describe the relationship between the template and target images. Centroid coordinates of the target can be obtained based on the affine transformation. Comparative tests were carried out to demonstrate that the proposed method can effectively improve the recognition rate and positioning accuracy of the industrial robots. The average time consuming is less than 0.2 s, the matching accuracy is 96 %, and the positioning error of the robot is less than 1.5 mm. Therefore, the proposed method has practical application importance.


I. INTRODUCTION
Traditional industrial robots function directly based on offline programming or teaching control and complete the designated actions by following the preset instructions, so that the labor force can be reduced with the increase of productivity [1], [2]. However, traditional industrial robots are not flexible enough to adapt to the working object's changes in shape, size, and position, so that the capacity of recognition rate and flexibility in dealing with different tasks degrade severely under these conditions [3], [4]. Hence, industrial robots must be intelligent to meet the needs of identifying and locating multiple operation targets equipped with multiple sensors.
With the development of visual sensor and computer vision theory, environment perception and target cognition of industrial robots have received much research attention [5], [6]. Amintoosi et al. [7] applied a shape feature vector matching algorithm to workpiece images to automatically sort workpieces. Vinividyadharan and Subusurendran [8] used a fuzzy algorithm for workpiece recognition. Geometric parameters of the workpieces were used as the identification feature. Lowe et al. [9] proposed a visual feature vector matching method to extract the features of workpieces. The established feature vector was used to achieve the identification of the workpieces. Ojala et al. [10] proposed the SIFT feature extraction approach and used a template matching method to identify the workpiece features. Then, the morphological method was adopted to obtain the workpiece feature point in two-dimensional (2D) space to improve the workpiece identification performance. Uijlings et al. [11] combined the Hough linear detection, geometric transformation and SIFT to eliminate the target rotation and extract useful features for target recognition. Ren et al. [12] used the artificial neural network (ANN) and SIFT to extract the image features and calculate the Euclidean distance between the target image feature and the extracted feature to realize the object recognition. Fezenswalb et al. [13] proposed the SURF-BRISK-MSAC algorithm to calculate the feature similarity to eliminate false feature matching points in workpiece identification. These machine vision technologies make it possible for industrial robots with intelligent vision ability in the complex industrial scene. However, due to industrial scene illumination and complex background disturbance, the size and shape of operational objectives may decrease the vision perception and location performance of intelligent industrial robots. It is still a challenging task to accurately identify and locate the objectives using machine vision.
In this paper, a job target recognition and location method is proposed to improve the flexibility and robustness of robots in varying tasks. In this new method, the SURF descriptor and the BRIEF descriptor with rotation invariance are used to extract the feature points of the job target. Then, neighbours of Hamming are used for the feature matching. Considering that the noise in the feature matching process may cause false matching, the PROSAC algorithm is used to double the matching points of the Hamming distance after the initial matching. The best matching feature points can be obtained to determine the target coordinates, which can be compared with the calibration coordinates to realize the identification and positioning of industrial robots.

II. SYSTEM COMPONENTS
An intelligent industrial robot system, similar to the four degree-of-freedom (DOF) SCARA robot, is constructed ( Fig. 1). Planar joints are used to realize the functions of positioning and orientation in this robot. The robot consists of one loading device with maximum 3 Kg load, one servo motor system, one visual system with a AVT GE 1050 industrial camera (a resolution of 1024×1024, frame rate 59 FPS at this resolution, progressive scanning mode and global exposure mode suitable for acquisition of moving targets, 25 mm optical lens, and LED light), a 360 degree rotating table driven by a stepper motor plus a harmonic reducer. The camera is installed in the robot arm end. A signal processing software is installed in a PC to collect the camera images and the open source development board Arduino mega 2560R3 is used to control the robot arm. The identification of job objectives is the prerequisite for the task of intelligent industrial robots to complete the crawling and assembling. The existing recognition methods mainly include template-based matching algorithm and feature-based matching algorithm. Template-based matching algorithm is susceptible to external environment interference and, thus, is not robust. The feature-based matching algorithm has strong anti-jamming effects on image scaling, rotation, affine transformation, and illumination, which can reduce the influence of noise and disturbances. As a popular feature-based matching algorithm, SIFT [14] is well-recognized in feature matching, but its computational complexity is high. SURF [15] can address this problem. In this paper, a SURF-based method is proposed. Theories of the proposed method are described as follows.
A. Feature Point Extraction SURF transforms the image and the Gaussian second-order differential template into the addition and subtraction of the integral image. In image I, a point (x, y) is selected. Hessian matrix of the point scale  is defined as

B. The Establishment of Scale Space
In order to obtain spots of different scales, it is necessary to establish a spatial scale pyramid of the image. As a result of the use of box filtering and integral images, SIFT does not need to build the image pyramid directly, but uses an indirect method instead. The response image of Hessian matrix is calculated using different size of box filter. Then, the 3D image of the response image is used to suppress the non-maximum value of the neighbourhood [16], [17]. The Box filter is constructed as shown in Fig. 2. The characteristic point of the Hessian matrix is compared with the other 26 values of the 3D scale space of the point. When the characteristic point value is greater than or less than the other 26 points, the point is the candidate feature point. Then, by the scale space and image space interpolation calculation, a stable feature point position and scale values can be obtained. By the method, 125 feature points are extracted from the template image as shown in Fig. 3.

C. BRIEF Describes the Feature Point
In order to increase direction information to the extracted feature points, the direction of the offset vector between the gray level of the feature points and the center of mass is defined by the gray scale centroid method as the direction of the feature points [18]. The neighbourhood moments are defined by where I(x, y) is the gray value of point (x, y), p and q are the coefficients of the moments. The centroid is expressed by The BRIEF descriptor is described by the BRIEF descriptor with rotation invariance [19]. The working principle of the BRIEF feature point is to select several pixel pairs in the neighbourhood of the feature points randomly. In order to solve the problem of noise interference, two 5×5 sub-windows are randomly selected in the 31×31 pixel neighbourhood with the feature point as the center [20]. The binary is obtained by using the criterion  assignment The centroid direction information of the feature point obtained by equation (3) is added to the descriptor. Each feature point in Fig. 3 is analyzed and encoded with a 256-bit binary encoding as shown in Fig.4.

D. Feature Point Matching
Using the feature points extracted by SURF and BRIEF, the Hamming distance between the two key points is calculated, and the nearest neighbour distance is used to measure the similarity [21]. The descriptors of the two feature points are derived from the feature descriptors obtained by 1 0 1 225 2 0 1 225 ,, K x x x K y y y  (8) and the Hamming distance between the two feature descriptors can be expressed as where x and y are binary values and  denotes XOR logic operation.
A large value of   , 12 D KK will lead to a large number of corresponding bits between the two feature descriptors and a small similarity between the two feature descriptors. Wheel bracket is chosen as the template in Fig. 5. The captured image at the job site is shown in Fig. 6. The images shown in Figs. 5-6 are matched by the above-mentioned algorithm. The wheel bracket is completely identified as shown in Fig. 7. In details, similarity between feature points and matching points can be estimated by Hamming distance. Matching time consuming is set at 100 ms in order to meet the real-time requirement of the system. With the total pair number 82 and the primary pair number 73, the matching accuracy is expressed as the ratio of the primary matching logarithm and the total matching logarithm and is equal to 89%. Obviously, some mismatching points exist in the matching of template image and operation target image, which come from the interference of noise and disturbance components. Errors of false matching may be the result of neighbouring Hamming distance, which affects the final matching accuracy.

E. PROSAC Algorithm for Secondary Fine Matching
Aiming at the mismatching points that affect the matching accuracy shown in Fig. 7, the Hamming distance is used as the similarity measure of the feature points and matching points. Due to noise and external interference, the nearest neighbour Hamming distance may result in false matching pairs. These points are called external points. The existence of external points seriously affects the matching accuracy of the algorithm. Therefore, to perform secondary matching of Hamming distance from the initial matching feature points, the PROSAC is used. In the PROSAC processing, RANSAC first extracts the samples from all the data [22]. Then, PROSAC divides the samples by descending the quality of the samples and extracts the samples from the data subsets with higher quality to find the optimal estimate solution, thus eliminating the false match. The result of the secondary matching is shown in Fig. 8. After the second precise matching, the feature points are further optimized. The matching time is reduced to 180 ms and the matching accuracy is 96 %.

III. JOB TARGET LOCATION ALGORITHM
Job target positioning is based on the identification of the target to determine the exact location of the coordinates. The general use of the target coordinates of the centroid is to characterize its position information. According to the intelligent industrial robot system (Fig. 1), there is a translational, rotational, and proportional change between the template image and the target image. Therefore, the affine transformation parameters are calculated according to the affine model. According to the centroid and affine transformation parameters of the target template, the centroid position coordinates of the image target in a complex environment can be obtained.

A. The Centroid Coordinates of the Job Target Template
The target template image is preprocessed as shown in Fig.  9. In this process, the edge contour information of the target is extracted by Canny operator. The background is filled by morphological algorithm. Thus, the complete shape of the target, as shown in Fig. 9(c), is obtained by removing the isolated small targets and burrs outside the target area. To realize determining the exact target location, an 8-neighbourhood labelling method is used to label the target image and two-dimensional centroid coordinates of the labelled region are calculated as  (10) where i and j are the horizontal and vertical coordinates of the target image pixels, n is the total number of pixels, and Ω is the set of pixels belonging to the same target image. The centroid coordinates of the template image obtained from the centroid calculation are shown in Fig. 9(d). where s is the scale change parameter, t x is the amount of translation in the x direction, ty is the amount of translation in the y direction, and  is the rotation angle.
The mapping between the point   By setting m11 = -m22 = scosθ, m13 = tx, m12 = -m21 = ssinθ, m23 = ty, it follows that: 11 12 13 n n x y x y x y The relationship between them is: The

C. Calculate the Centroid of the Target Image
According to the centroid coordinate position of the obtained template image, the position of the centroid coordinate of the target in the target image is calculated by the affine transformation model parameter into (13) as shown in Fig. 10.

IV. EXPERIMENT ANALYSIS
The intelligent industrial robot system shown in Fig. 1 is used to carry out the experimental analysis of the proposed recognition localization algorithm. The template image and the job target image are extracted, matched, and identified, and the acquired target coordinates are combined with the calibration parameters of the visual system. The results demonstrate the assembly ability of the robot.
In the test, a metal structure with a thickness of 2.0 mm is a target part to be grabbed by a sucker-type grabbing mechanism. The two-dimensional position information of the target is calculated in the process of locating the target. Fig. 11 shows the logic process in identifying and positioning targets. Feature points are detected, firstly, by SURF algorithm and their main directions are extracted using gray centroid method. Thus, directional BRIEF descriptors are constructed to describe the feature points. By combing Hamming distance and PROSAC algorithm, mismatching points are eliminated in the second precise matching procedure.  Fig. 11. Flow chart of identifying and positioning targets.

8
In the experiment, the classical SIFT, SURF, and the proposed algorithm are compared in the same scenario. The experimental results are shown in Fig. 12. In addition, the computational time of the proposed algorithm is less than 0.2 s, which is much smaller than that of SURF and SIFT. High computing efficiency of the proposed method makes it competent for real-time industrial production.  In addition, the recognition results are shown in Fig. 13 in case of occlusion or overturn for target. It can be seen from the graph that this algorithm exhibits many characteristics, such as scale and rotation invariance under various disturbances, and satisfies the requirements of low time-consuming.

B. Positioning of Job Objectives
Complex workpieces with regular shape and irregular shape are identified and positioned to verify the proposed algorithm. Target is included by green box in Fig. 14, which also shows the positioning experimental results. Combined with the calibration parameters of the visual system, the operating centroid of the job objective can be obtained (Table  II). The positioning error in the table is less than 1.5 mm and the largest error is 4.2 %. As a result, the positioning accuracy of the proposed method is satisfactory.   V. CONCLUSIONS An intelligent industrial robot system with visual perception was constructed with four degrees of freedom SCARA robot and AVT GE 1050 industrial camera in this paper. A new matching and positioning method based on the combination of SURF, BRIEF, Hamming distance, and affine transform combination was proposed to eliminate/reduce the influence of industrial complex environment on the recognition and positioning of the target image. The experimental validation results demonstrated that (1) the metal structural part with the thickness of 2.0 mm was chosen as a target and its image is obtained exactly under rotation and occlusion, which verifies the effectiveness of the presented algorithm, (2) the feature points can be extracted by SURF and BRIEF, and the Hamming distance of pseudo feature points matching algorithm using PROSAC can improve the matching speed and accuracy of the feature points, and, (3) through the transformation between the template image and the target image, a 4-parameter affine transformation model was established to obtain less than 1.5 mm of the positioning error. The results showed that the proposed method is able to provide the position information of the target for robot grasping and assembly in practical applications. In the future works, the proposed algorithm will be improved by increasing feature points, expanding application scope, and reducing the complexity to obtain better accuracy of the workpiece recognition.