A New Heuristic Approach for Treating Missing Value: ABCimp
Missing values in datasets present an important problem for traditional and modern statistical methods. Many statistical methods have been developed to analyze the complete datasets. However, most of the real world datasets contain missing values. Therefore, in recent years, many methods have been developed to overcome the missing value problem. Heuristic methods have become popular in this field due to their superior performance in many other optimization problems. This paper introduces an Artificial Bee Colony algorithm based new approach for missing value imputation in the four real-world discrete datasets. At the proposed Artificial Bee Colony Imputation (ABCimp) method, Bayesian Optimization is integrated into the Artificial Bee Colony algorithm. The performance of the proposed technique is compared with other well-known six methods, which are Mean, Median, k Nearest Neighbor (k-NN), Multivariate Equation by Chained Equation (MICE), Singular Value Decomposition (SVD), and MissForest (MF). The classification error and root mean square error are used as the evaluation criteria of the imputation methods performance and the Naive Bayes algorithm is used as the classifier. The empirical results show that state-of-the-art ABCimp performs better than the other most popular imputation methods at the variable missing rates ranging from 3 % to 15 %.
Authors retain copyright and grant the journal the right of the first publication with the paper simultaneously licensed under the Creative Commons Attribution 4.0 (CC BY 4.0) licence.
Authors are allowed to enter into separate, additional contractual arrangements for the non-exclusive distribution of the paper published in the journal with an acknowledgement of the initial publication in the journal.
Copyright terms are indicated in the Republic of Lithuania Law on Copyright and Related Rights, Articles 4-37.