A New K Nearest Neighbours Algorithm Using Cell Grids for 3 D Scattered Point Cloud

In this paper, we propose one novel and efficient K nearest neighbours search algorithm based on 3D uniform cell grids. First, a simple min-max box is used to store all the points and a twice division strategy is adopted to determine the edge length of basic grids. Then we limit the search space for each grid with certain query points to control the amount of distance calculations under a suitable level. And the computational cost of sorting operations is reduced effectively through avoiding the unnecessary calculations, with the help of properly determined subspaces and sphere spaces for points. Compared with existing related algorithms, our method can search for the K nearest neighbours accurately and quickly, and it has many possible applications in the fields using 3D scattered point clouds. DOI: http://dx.doi.org/10.5755/j01.eee.20.1.3926


I. INTRODUCTION
Point primitives have been used popularly as one kind of surface representation in the fields of virtual reality, scientific visualization, geometric modelling, reverse engineering, and so on [1]- [4].With the advanced scanning technologies, it is possible for lots of real objects to be scanned into 3D point clouds [5].Note that the point cloud is nothing more than 3D coordinates of a collection of scanned points, and contains no topological information [6].But other additional geometric information on the surface, such as surface normal or local curvature [7] which have to be calculated from the positional data, are required necessarily for the subsequent processing [8].What is common to obtain local geometric information from the point cloud is to calculate K nearest neighbours for each point [9].Getting the neighbours is meaningful.First, the calculation can be as small as possible to allow efficient processing of mass data sets.Second, neighbours guarantee that their local geometric information can be approximated provably well to a certain extent.
Aiming at increasing search speed, many research works have been performed consequently, and they are developed based on the basic idea: all the scattered points are firstly divided into small "regions" and stored into the related spatial data structure, then some strategies are used to find K nearest neighbours for the query point.There are different kinds of shapes already been adopted for the small regions, including Voronoi diagram [10], [11], hyper rectangular bucket [12]- [14] (e.g.K-d tree, Quadtree and Octree), bounding rectangle [15], [16] (e.g.R-tree and SR-tree), bounding sphere [17] (e.g.SS-tree), pyramid [18] and uniform cell grids [19], [20].
All the methods intended to set up reasonable and efficient data storage structures to search for the K nearest neighbours of query point with higher speed.However, the calculation of Voronoi diagram is still unbearable especially for the large scattered point sets.Comparatively, other data structures are more effective and acceptable.Bentley's method of K-d tree [21] has been widely used.It partitions the data space into hyper-rectangular buckets, then K nearest neighbours of any query point are obtained with a binary search for the target bucket and a local search for desired points in target bucket and its neighbouring buckets.A fast K nearest neighbours algorithm proposed by Sankaranarayanan [22] identifies a region in space that contains all of the K nearest neighbours for a collection of points.Each query point searches only the locality for the correct set of K nearest neighbours once the best possible locality is built.What's more, any hierarchical spatial data structure can be used in this method including some structures that are based on object hierarchies such as R-tree, Quadtree and Octree.
Different with the above methods, the algorithm presented in our paper divides the point cloud data into the basic 3D uniform cell grids using a twice division strategy.Based on the adjacent relationship between arbitrary 3D cell grid and the grids around it, related subspaces and sphere spaces are defined with the outwards expansion of cell grid, and they are used to limit the search space for each grid.Through reducing the expense of sorting operations, our approach can obtain the K nearest neighbours with less time.
The rest of our paper is organized as: the preliminary work to establish the 3D cell grids are presented in Section II, the proposed novel search algorithm is described in Section III, experimental results are given and analysed in Section IV, and finally the conclusion is drawn in Section V.

II. PRELIMINARY
Before searching for K nearest neighbours in a set of point cloud, the 3D cell grids need to be established to divide all scattered points into related units including two types: empty grids and non-empty grids, while each non-empty cell grid contains at least one point. The For the cuboid with volume V abc  , in ideal situation the scattered points are distributed in its 3D space uniformly.Suppose the space is well divided so that each grid contains one point, the edge length of each grid can be preliminarily set . In this case, there are 0 0 0 l m n   divided cell grids, where: In fact, the scattered point cloud is seldom well distributed as above.Therefore, suppose cube N is the total number of 3D cell grids that actually contain at least one point, the average density of points in cuboid is computed as Since our algorithm is designed to search for K neighbours of every point, we can make a necessary adjustment based on the average density for the edge length of each grid as where  is the adjusting parameter, and its value may vary for different kinds of scattered point clouds.
Then the points are divided for the second time and are re-distributed into l m n   grids, where: Now, any cell grid in the cuboid can be denoted with the index number ( , , ) i j k , which is equivalent to the following descriptions: where ( , , ) x y z is the grid vertex with minimum coordinates.

III. OUR SEARCH ALGORITHM
A. Related Definitions Definition (1), Subspace: be a subspace of the grid 0 0 0 ( , , ) i j k and denoted as ( ) S r , which is obtained by the extension outwards with r times of the edge length L from the cell grid 0 0 0 ( , , ) i j k .If the minimum and maximum grid vertices of the subspace belong to grid 1 1 1 ( , , ) i j k and grid 2 2 2 ( , , ) i j k respectively, subspace ( ) S r can be also denoted as 1 1 1 2 2 2 ( , , , , , ) i j k i j k .Definition (2), Ultimate space: After extension outwards for s times with edge length L from the grid 0 0 0 ( , , ) i j k , if subspace ( ) S s contains the K nearest neighbours of an arbitrary point in grid 0 0 0 ( , , ) i j k for the first time, subspace ( ) S s is an ultimate space and denoted as ultimate S .The ultimate space is used to limit the search range for any query point in grid 0 0 0 ( , , ) i j k .Definition (3), Sphere space: Given one query point O in grid 0 0 0 ( , , ) i j k , the sphere with radius R centred with O is the sphere space of point O , and is denoted as sphere S .
For any query point in a non-empty cell grid, two kinds of its sphere S are introduced: internal sphere space insphere S , and external sphere space outsphere S .The radius of insphere S and the radius of outsphere S are denoted as in R and out R respectively, where in R is less than out R .
To Assume query point O is located at an arbitrary position in grid 0 0 0 ( , , ) i j k , we can extend the grid outwards gradually until subspace ( ) S p (shown in Fig. 1), which is extended for p times of edge length L and contains no less than K points for the first time.Also we denote , , dx dy dz as the larger distances from query point O to the two surfaces of its grid 0 0 0 ( , , ) i j k along x-axis, two surfaces along y-axis, and two surfaces along z-axis respectively.If the coordinates of query x y z , there are: ), ), ).
(1) Determining of insphere S Suppose short d is the minimum value of shorter distances between query point O and the two surfaces of its grid along x-axis, y-axis and z-axis, it can be described as min( , , ).
Thus the radius in R of insphere S can be calculated as .
Obviously, if the number of points in the internal sphere space insphere S of the query point O is less than or equal to K, they must all belong to the K neighbours of point O without taking any sorting operations.
(2) Determining of outsphere S Since subspace ( ) S p has expanded p times of the edge length L from grid 0 0 0 ( , , ) i j k , the radius out R of outsphere S can be described as From ( 14)-( 16), it is easy to know that values of , , dx dy dz are less than edge length L .Thus the distance d from the query point O to any other point in ( ) S p must be satisfied with ( 1) 3 .
Since there are at least K points in subspace ( ) S p , the number of points in outsphere S must be more than or equal to K, i.e. all the K nearest neighbours of query point O are located in space outsphere S .
(3) Determining of ultimate S Similar with subspace ( ) S p , subspace ( ( 1) 3 1) is obtained after expanding the 3D cell grid 0 0 0 ( , , ) i j k for Since ( ) S p contains at least K nearest neighbour points for query point O , for any arbitrary point in the same grid 0 0 0 ( , , ) i j k , its K nearest neighbours can be limited in space , which is the ultimate S of all points in the 3D cell grid 0 0 0 ( , , ) i j k .The mechanics of our algorithm are shown in Fig. 1.For example, if 2 p  , ultimate S of the related grid 0 0 0 ( , , ) 6). S For query point O in cell grid 0 0 0 ( , , ) i j k , all the K nearest neighbours can be obtained from its corresponding spaces insphere S and outsphere S .

C. Implementation of Searching Algorithm
For 3D scattered points, there are usually more than one point in most of non-empty cell grids, as shown in Fig. 2. Of course, all the points in the same cell grid share the common ( ) S p and ultimate S .
When searching for K nearest neighbours for the first point in cell grid, we record ultimate S and p .Then the other points in the same grid can directly use the value of p to compute both in R of insphere S and out R of outsphere S without extending subspace from the original grid gradually and repeatedly.In this way, we can avoid a large number of unnecessary calculations, and thus improve the search efficiency.
Fig. 2. Points P and Q in the same 3D cell grid.
The specific steps of our algorithm are as follows: Step 1. Calculate the edge length L of 3D cell grid and divide all points into the grids; Step 2. Traverse all the scattered points in point cloud, for any query point P, if the cell grid with point P has already been visited, go to Step 3; otherwise calculate the subspace ultimate S and record the intermediate value p , then set status of the cell grid as visited; Step 3. Calculate radius in R of insphere S and radius out R of outsphere S for the query point P; Step 4. Count the number of points (denoted as  ) in subspace insphere S for query point P: -If K   , all the points in subspace S belong to K neighbours of point P, and the remaining ( K   ) neighbours are selected through sorting operations from the points in space outsphere insphere S S  ; -If K   , all the points in subspace insphere S are just the K neighbours of point P; -If K   , all the K neighbours of point P must be located in subspace insphere S , and they can be selected through sorting operations from the points in insphere S .
Step 5. Go to Step 2, until all the scattered points in point cloud have been visited and processed.

IV. EXPERIMENTAL RESULTS
Experiments are conducted to evaluate the performance of our algorithm, and the experimental environment is Intel(R) Core(TM) 2 Duo CPU E7400 processor 2.80 GHz, 2.0 GB RAM, XP Operation System, and Microsoft Visual Stdio.Net 2008 IDE.The employed 3D scattered point clouds for experiments are often used in computer graphics or other relevant and the number of points in these 3D scattered point clouds range from 10 k to 120 k.
According to [22], two parameters are usually applied to compare the effects of different kinds of K nearest neighbours searching algorithms: the executed time, and the Euclidean distance sensitivity which is defined and calculated with the following ( _ ) , log where S  and ( _ ) N dist calc represent distance sensitivity and total number of point distance calculations (the most time consuming operations for sorting points) respectively, while log N N is the cost of sorting N points.

A. Verification of Searching Accuracy
To verify the accuracy of our new searching algorithm for K nearest neighbours, the Stanford Bunny model containing 35,947 points is taken as test data set, as shown in Fig. 3.We consider all possible cases of query points, which are divided into four categories: (a) the query point is one of the smooth points sampled from the back of bunny; (b) the query point is one of the ridge points sampled from the ear of bunny and the curvature of these points changes significantly; (c) the query point is one man-made point, which is far from the back of bunny; (d) the query point is one man-made point, which is located between two ears of bunny.
Figure 3 illustrates the above four categories of query points selected from the bunny model and marked by "+".K is set as 20, and the corresponding 20 nearest neighbours are searched for every query point.To show them clearly, the searched K neighbour points are circled by curves.Parameter  in ( 7) is the key parameter of our algorithm, since it affects the edge length of 3D cell grids and thus influences the following step of subspace expansion.So we study the effects of different  on the performance of the proposed K nearest neighbours' algorithm with the 3D point cloud of Stanford Bunny model.For a given integer value of K, i.e. 8, 16, 32, 64, the corresponding value of  varies from 0.10 to 0.50.Performance of our algorithm is evaluated through measuring the executed time (including pre-process time, searched time, and the sum of them, i.e. total time) and the value of Euclidean distance sensitivity.The pre-process time is the period used to read point cloud from model file and then divide the points into corresponding spatial data storage structure, i.e. the 3D cell grids; the searched time is the period used to search for the K nearest neighbours based on the cell grids; while the total time is the whole period including the pre-process time and the searched time.
Figure 4(a) shows the effects of  on the pre-process time for different K values.When  is a larger value, the pre-process time is less.Since larger  produces bigger 3D each grid, and thus the division time for points is less.Figure 4(b) shows the effects of  on searched time.If  is too small, there will be a huge number of small 3D cell grids to be processed; if  is too big, the cost for each big grid will be increased obviously.When  is between 0.25 and 0.3, the curves display higher searching efficiencies.Figure 4(c) shows the effects of  on total time, where the curve shapes are nearly the same as those of Fig. 4(b), due to the fact that pre-process time is only a small proportion in the total time of our algorithm.Figure 4(d) shows the effects of  on distance sensitivity, and it can be found that the distance sensitivity is relatively low when  is very small or varies from 0.22 to 0.26.
Then we give further analysis on the relationships between parameter  and the average numbers of points in ultimate S , insphere S , outsphere S , and average number of nearest neighbours obtained directly from the points in insphere S without taking any sorting operations, as illustrated in Fig. 5.  shows that when  varies between 0.20 and 0.26, a higher number of average nearest neighbours can be directly obtained from the space insphere S without any sorting operations, which is very helpful to avoid the unneeded computations and thus improve the searching efficiency of our algorithm.
From the experiments it can be found that parameter  indirectly affects the speed to search for K nearest neighbours through the edge length of 3D cell grids.If  is very large, although ultimate S can be determined quickly, both insphere S and outsphere S contain excessive numbers of points, and thus will decrease the search efficiency.On the contrary, if  is very small, although insphere S and outsphere S contain less numbers of points, the number of cell grids will be huge due to the small edge length of basic cell grids, accordingly expansion speed of the subspaces will also be slow.Therefore, to guarantee the higher search efficiency, there should be a suitable value of  .Based on aforementioned analysis of the experimental results, in our work parameter  is set between 0.22 and 0.30.

C. Comparison with Related Algorithms
Our algorithm is further evaluated through comparing both executed time and distance sensitivity with the other existing related methods, i.e.Bentley's method [21] (represented with Bent's) and Sankaranarayanan's method [22] (represented by Sank's), which are popularly used.The point clouds of five 3D models are employed for experiments, and they are: foot with 10016 scanned points, bust with 25126 scanned points, golf with 39479 scanned points, horse with 48485 scanned points, teeth with 117871 scanned points, as shown in Fig. 6.During experiments, different values of K, i.e. 16, 32, 64, are adopted to obtain the comprehensive comparisons among the three searching methods.
The experimental results are shown in Fig. 7, where each row displays the results from one of the five 3D models.The executed time includes pre-processed time, searched time and total time.They are illustrated by blue, red and green bars respectively, while the height of green bar is the sum of the heights of blue bar and red bar.
It can be found from Fig. 7 that the pre-processed times for three methods have only slight differences, but our method has the least searched time and total time, i.e. having the best search efficiency.As for Euclidean distance sensitivity, our method is superior to Bent's method and inferior to Sank's method.The reason is that we to perform many calculations to obtain ultimate S , insphere S , outsphere S .But these computations are necessary since they are very helpful to reduce the searched time in the following steps, and achieve the final results with less total time.

V. CONCLUSIONS
In this paper, a novel search algorithm is proposed for K nearest neighbours based on 3D cell grids.The contributions of our approach include: (1) we use a simple min-max box composed of 3D uniform cell grids to store all the scattered points, and adopt the twice division strategy to determine the edge length of the cell grids in order to control the step of extension outwards from the original query cell grid; (2) to control the distance calculations under a suitable level, we introduce the "ultimate space" to limit the search space for each cell grid with certain query points; (3) to reduce the scale of sorting operations for scattered points, we define the sphere spaces including "internal sphere space" and "external sphere space", from them the K nearest neighbours can be obtained while unneeded sorting operations are avoided; (4) for the situation that there are more than one point in the same cell grid, we only need to calculate the "ultimate space" of the cell grid for one time, and the corresponding values can be recorded to avoid repeated calculations when all the points in the same grid are considered as query points.
Experimental results have proved the accuracy of our new algorithm.Based on statistical analysis, setting of parameter is discussed.Through comparisons with the existing related methods, our algorithm has shown the advantages of owning much better search efficiency while controlling the distance sensitivity under a suitable level.
Manuscript received March 28, 2013; accepted October 24, 2013.This research was funded by National Basic Research Program of China (973 Program, No. 2011CB707904), National Natural Science Foundation of China (No. 60603079, 61272276), Science and Technology Bureau of Suzhou of China (No. SH201115).

3 .
Different types of query points and their K nearest neighbours.

Fig. 4 .
Fig. 4. The effects from parameter  on (a) pre-processed time, (b) searched time, (c) total time, and (d) Euclidean distance sensitivity for different values of K.

Fig. 5 .
Fig. 5.The effects from parameter  on (a) average points in ultimate S

Figure 5 (Fig. 5 (
Figure 5(a) shows that when  varies round 0.26, the average points in ultimate S can be controlled under a relatively stable and low number; Fig. 5(b) and Fig. 5(c) show the similar varying tendency as Fig. 5(a), since  affects the spaces ultimate S , insphere S and outsphere S in a similar way;Fig.5(d)showsthat when  varies between 0.20 and 0.26, a higher number of average nearest neighbours can be directly obtained from the space insphere S without any sorting