A New Machine Vision Method for Target Detection and Localization of Malleable Iron Pipes: An Experimental Case

1 Abstract —Malleable iron pipes are


I. INTRODUCTION
The pipes, which are made of malleable iron, are often used as a two-way and three-way pipe connector in construction, manufacturing, aerospace, and other fields. After being formed by the casting process, malleable iron pipes also need to be processed to get a different shape and size. At present, the process of processing of malleable steel pipe fitting is completed by manual operation, which has high labor intensity, low work efficiency, and security risks. In recent years, the development of vision technology and artificial intelligence has caused industrial robots to constantly improve and replace manual labor gradually, especially in dangerous environments The progress of science and technology improves the efficiency of the production of enterprises and plays an important role in the transformation and upgrading of enterprises [1].
Three-dimensional information about the operating target is the premise of positioning and grasping. Researchers mainly explore the three-dimensional pose of the operating target from two aspects: stereo matching algorithm and point cloud matching technology. Stereo matching is to obtain the left and right images of the same target using a binocular camera to calculate the parallax map, and the camera determines the 3D information of the target by calibration [2]. Scharstein and Szeliski [3] established the global function according to the global optimal theory and solved the optimal disparity value by minimizing the global function. This kind of algorithm has a large amount of computation and good accuracy, but the stereo matching effect is poor in real-time operation. Hermann, Morales, and Klette [4] fuse global and local features and improve operation efficiency on the basis of ensuring accuracy. Zhang, Li, Cheng, Cai, Chao, and Rui [5] proposed a global stereo matching model based on disparity map interpolation to optimize stereo matching by establishing two Markov Random Fields (MRFs) and piecewise characteristic modeling. Yan, Lixin, and Feng [6] obtained the relative pose by binocular vision calibration, completed the matching by the sift +Harris algorithm, and obtained the three-dimensional pose of the workpiece by sub-pixel positioning in the Harris corner. He, Sun, and Tang [7] used the guided filtering method to calculate the matching cost of the local region of the left and right images and obtained a good matching effect. The point cloud matching technology can get the pose information of the target under different working conditions according to the transformation between the template point cloud and the target point cloud. Point cloud matching includes rough matching and precise matching. In rough matching, the corresponding feature matching algorithm is to obtain the feature points of the target point cloud, which is related to the position changes of the template point cloud and the target point cloud, mainly including local geometric characteristics of the object surface [8], [9], spherical harmonic function, and integral invariants. The feature descriptor algorithms include the Globally Aligned Spatial Distribution (GASD) [10] and k-dimensional (K-D) tree [11], etc. Mellado, Aiger, and Mitra [12] proposed a four-point random sampling matching algorithm and improved algorithm, which has been applied to chip structure design. The Iterative Closest Point (ICP) algorithm is a commonly used precision matching algorithm, which searches for the nearest Euclidean distance point and adjusts the parameters of the objective function to obtain the best matching parameter. For grasping robots with vision system, many scholars have done in-depth research. Wang et al. [13] designed a tomato picking robot, which identified ripe tomatoes by binocular vision and positioned to pick. Vahrenkamp, Asfour, and Dillmann [14] designed an Agricultural Robots Performance Assessment Robot (ARPAR)-III with a binocular stereo vision detection system installed in the head, which can perform the functions of target recognition, positioning, and grasping. Xiong, Zhong, Liu, and Tu [15] developed a robot grasping system with binocular vision and performed three-dimensional reconstruction of the target in actual working conditions through binocular stereo vision technology to achieve target sorting. Bakker, van Asselt, Bontsema, Müller, and van Straten [16] designed a field robot based on vision to complete various grasping tasks in the field by autonomous navigation. Zeng et al. [17] identified, positioned, and grasped the pipes by deep learning combined with multi-view RGB-D images, calculated the pose of the pipe fitting by template matching combined with the ICP algorithm, and grasped it by robot. Zhou, Zhang, Zhou, Xi, and Chen [18] developed a tomato picking robot. The binocular camera was used to obtain the images, and then the feature points in the images were matched to calculate the three-dimensional coordinates of the tomatoes, which guide the robot to pick the tomatoes. Jain and Kemp [19] designed a robot with a binocular stereo vision sensor installed on its head for disabled people. It detects objects in the working scene through the sensor and realizes target grasping. A visual sensing device was placed in the head position of the robot, designed by Koolen et al. [20] to realize the recognition and detection of obstacles, and guided the robot to lift the target according to stereo vision.
Above all, the algorithm and point cloud matching technology can obtain the 3D pose information of the target under different working conditions by collecting and transforming the point cloud of the target, so as to realize recognition and localization. The grasping robot based on vision system controls the robot to complete grasping according to position information. The objects mentioned in the above literature are mostly scattered and simple in state, and the research on target recognition and grasping in complex stacked state is rarely involved. In this paper, a target recognition and localization method is proposed to grasp malleable iron pipes in stacked state, and the robot is guided to complete grasping. This paper is divided mainly into three parts. The point cloud image of malleable iron pipes was obtained by 3D reconstruction of the disparity map firstly. Secondly, the point cloud matching ICP algorithm was used to match the 3-D pose of the target point cloud, and the matching pose of the target was obtained. Finally, an experimental platform for the grasping robot of malleable iron pipes with visual perception was built to recognize and grasp the stacked pipes, so as to verify the reliability of the recognition and localization method.

II. PROPOSED METHOD
The malleable iron pipes grasping robot system consists of two parts: the binocular vision system and the executing agency. The binocular vision system collects the image of the malleable iron pipes and processes the image through stereo matching and 3D reconstruction algorithms in the computer to obtain the point cloud data. Point cloud matching technology is used to obtain the pose of the malleable iron pipes, so as to recognize and grasp the pipes. The actual grasping position of the malleable steel pipe fitting was obtained by the hand-eye calibration pose transformation matrix, which was used to guide the robot to grasp the pipes. Figure 1 depicts the robot system with a binocular camera. As shown in Fig. 2(a), the surface of the malleable iron three-way pipe is mostly of weak texture with a cross shape of round tubes. Figure 2(b) shows the malleable iron pipe in the stacked state. The basic size of the pipe is the same, which weighs about 120 g. The robot is a HIWIN 6-degree of freedom (6-DOF) robot with the RA605-GC-710 model (the parameters are shown in Table I). As shown in Fig. 3, the fixed installation mode of work is adopted. The overall layout area of the system is 1.5 square meters, and the diameter of the maximum movement range of the robot is 1420 mm.

A. Parameters Calibration
The BB2-08S2C binocular camera (the parameters are shown in Table II) is selected according to the resolution requirements of the size of the malleable iron pipe. The camera has the functions of lens distortion correction, depth measurement data conversion for the full field of view, etc. The binocular_calibration operator is used to calibrate the calibration plate, and the HALOCN software is used to complete the parameter setting and image generation. The distance between two circles in the calibration plate is 30 mm, the diameter is 15 mm, and the calibration plate dimension is 260×260 mm. The HALCON calibration plate images are shown in Fig. 4. The Binocular_calibration operator is used to initially calibrate the calibration plate image, and the internal and external parameters of the initial calibration of the binocular camera are shown in Table III.  Figure 5 shows the calibration results of the binocular camera. The right camera is an external parameter for the left size. Due to the distortion in the image acquisition process, it is necessary to correct the image geometries. The gen_binocular_rectification_map operator is used to correct the binocular calibration, and the results are shown in Table  IV.

B. Calibration of Robot Coordinate System
Hand-eye calibration is to obtain the position conversion relationship between the camera and the robot base coordinate system and convert the target coordinates into the robot base coordinate system [21] to realize the robot's grasping of malleable iron pipes.
The coordinate system of the malleable iron pipe grasping robot is shown in Fig. 6, where O1 is the robot base coordinate system, O2 is the camera coordinate system, and O3 is the calibration plate coordinate system. As shown in Fig. 6, T 13 is the transformation between the robot and the calibration plate coordinate system, T 23 is the transformation between the camera and the calibration plate coordinate system, and T 12 is the transformation between the robot and the camera coordinate system. The origin of the calibration plate coordinate system at the same point and the coordinates of the points on the X, Y, and Z axes were determined by the end of the robot. The origin coordinate is O1rob = (9.311, 661.101, - The position can be obtained by the transformation matrices.

A. Image Processing of Malleable Iron Pipes
Image enhancement can not only highlight the features of the target in the malleable pipe fitting image, but also remove the information that is not related to the target in the image. The contrast between target and background is not obvious when acquiring the image of malleable iron pipes, which affects the extraction of information about malleable iron pipes. The image of malleable iron pipes is processed by gray transformation, which is commonly used in image enhancement, as shown in Fig. 7.
The image of malleable iron pipes is transformed by stretching. According to different parameters, the gray stretching transformation is completed for the images of overbright and overdark malleable iron pipes, as shown in Fig. 8.
Influenced by factors such as light source, data transmission media, and so on in the process of image acquisition, noise generated in the malleable iron pipes image. Therefore, the malleable steel image should be filtered. In this paper, a salt and pepper noise of 0.05 was added to the malleable iron pipes image, and the median filtering method was adopted. The results are shown in Fig. 9.

B. Stereo Matching Optimization
The stereo matching optimization algorithm of the adaptive weight algorithm based on the HSI weight allocation and the left-right consistency check method is carried out for the left and right images of malleable steel pipe fittings under three different stacking conditions (working conditions 1-3, respectively). The settings are as follows: window size N = 30, cut-off value T = 35, feature difference parameters C = 3 and P = 14.5. The result of the process is shown in Figs. 10 and 11.
As can be seen from Fig. 11, due to the different positions of the left and right cameras in the binocular camera and the different shooting angles, there are some mismatching areas caused by occlusion in the disparity map processed by the adaptive weight stereo matching algorithm. The black pixels are the mismatching points, which affect the accuracy of stereo matching. When the HSI weight allocation and left-right consistency check method are introduced, most mismatched pixels are corrected, and the accuracy of the disparity map is improved. The matching results of the original algorithm and the optimization algorithm are quantified using the mismatched pixel ratio evaluation method (PBM), as shown in Table V. In the table, Noc is the mismatching rate of the uncovered part, All is the mismatching rate of the whole region, Disc is the mismatching rate of the deep discontinuous part, and Avg is the average value of the three conditions.
As can be seen in Table V, the stereo matching optimization of the adaptive weight algorithm based on HSI weight allocation and left-right consistency check method improves accuracy, which is used in this study.

C. Recognition and Localization of Malleable Iron Pipe
The disparity map of malleable iron pipes is a 2-D graph. The three-dimensional reconstruction of binocular vision mainly converts the coordinate information from 2-D to 3-D through the calibration parameters of the binocular camera and the disparity information of the left and right images of malleable iron pipes. The disparity map in condition 1 is reconstructed in this paper, and the depth map and 3-D point cloud diagram of malleable iron pipes are shown in Fig. 12. It can be seen from Fig. 13 that the matching only coarsely overlaps the pose of the target and the template point cloud, which has low accuracy. To further reduce the matching error between the target and the template point cloud, precise matching is achieved by the ICP algorithm to obtain accurate matching results. The geometric model was adopted by the ICP algorithm to match the point cloud, which can calculate the rotation matrix and the translation matrix of the two-point cloud.
ICP precise matching is performed on the basis of rough matching, and the target point cloud basically coincides with

IV. GRASPING EXPERIMENTAL RESULTS
The malleable iron pipe used in this experiment is the commonly used three-way pipe. Figure 15 shows the 30 mm and 50 mm specification, respectively.
Eight malleable iron pipes with a specification of 50 mm are selected for identification and localization experiments under the stacked state, as shown in Fig. 16(a). The point cloud matching results obtained by the proposed method are shown in Fig. 16.
The matching poses of each target point cloud in Fig. 17 were recorded and the positions of 8 malleable iron pipes were obtained by calculation as shown in Table VI.
Malleable iron pipes with 5 specifications of 50 mm and 5 specifications of 30 mm are selected for identification and localization experiments in the stacked state as shown in Fig.  17(a). The point cloud matching results obtained by the proposed method are shown in Fig. 17. The matching poses of each target point cloud in Fig. 17 were recorded, and the poses of 10 malleable iron pipes were obtained by calculation as shown in Table VII.
The experimental results show that the grasping robot system with binocular vision is implemented to recognize and locate malleable iron pipes when the pipes are placed at random, and the robot is controlled to grasp according to the position of the pipe fitting. Among them, the grasping success rate of malleable iron pipes stacked with the same specifications is higher than that of different specifications, because the target gap of 50 mm is larger, and the recognition rate of the binocular vision system is higher, which leads to a high grasping success rate.       V. CONCLUSIONS A robot grasping system of malleable iron pipes based on binocular vision was proposed to solve the problem that the industrial robot replaces the manual process in the flat process of malleable iron pipes based on machine vision technology. The image of malleable iron pipes was obtained by a binocular vision system, and the parallax image and the point cloud data were obtained by stereo matching. The point cloud data matching algorithm was used to determine the position of malleable iron pipes and guide the robot to complete the grasping work. The main conclusions were as follows: 1. Parallax image and point cloud data were obtained by stereo matching, and the point cloud data matching algorithm was used to determine the pose of malleable iron pipes; 2. The ICP algorithm was used to match the 3-D position of the target point cloud, and the matching pose of the target pipe fitting was obtained; 3. The grasping experiment shows that the accuracy of the proposed method is more than 85 %.

CONFLICTS OF INTEREST
The authors declare that they have no conflicts of interest.