A Novel Approach for Detection of Fake News on Social Media Using Metaheuristic Optimization Algorithms

1 Abstract —Deceptive content is becoming increasingly dangerous, such as fake news created by social media users. Individuals and society have been affected negatively by the spread of low-quality news on social media. The fake and real news needs to be detected to eliminate the disadvantages of social media. This paper proposes a novel approach for fake news detection (FND) problem on social media. Applying this approach, FND problem has been considered as an optimization problem for the first time and two metaheuristic algorithms, the Grey Wolf Optimization (GWO) and Salp Swarm Optimization (SSO) have been adapted to the FND problem for the first time as well. The proposed FND approach consists of three stages. The first stage is data preprocessing. The second stage is adapting GWO and SSO for construction of a novel FND model. The last stage consists of using proposed FND model for testing. The proposed approach has been evaluated using three different real-world datasets. The results have been compared with seven supervised artificial intelligence algorithms. The results show GWO algorithm has the best performance in comparison with SSO algorithm and the other artificial intelligence algorithms. GWO seems to be efficiently used for solving different types of social media problems.


I. INTRODUCTION
Recently, the development of online social media has changed the way people access information.People can get news about social, economic, political, and scientific events in the world through social media.The fact that the news on social media includes videos and pictures causes the loss of importance of traditional news platforms, such as television and newspapers.In addition, online social media provides advantages, such as easy access to information, low cost, and rapid spread of information.Although social media has many advantages, unfortunately, most of the news on social media may be changed by malicious people and, therefore, it may become unreliable.Such news spreads fast through social media and causes a negative impact on social media readers and users.Therefore, the fake news detectors are needed to reduce the negative impact caused by fake news.
Although the FND problem is very recent, it has attracted considerable attention already.Different approaches have Manuscript received 11 November, 2018; accepted 17 April, 2019.been proposed to detect the fake news in various types of data.
The vector space model has been used to cluster the news by discourse feature similarity (63 % of accuracy have been achieved) in [1].Considering the data source used in this study, the subtle differences between different stories have been aimed to be identified and it is not suitable for news.Support Vector Machine (SVM) is adopted for the classification of news within BBC datasets and 20 Newsgroup datasets [2].The precision values obtained for the two data sets are 94.93 % and 97.84 % for 20 Newsgroup and BBC datasets, respectively.As a powerful classifier in many fields, SVM, which is based on the lowest structural risk principle, has also obtained promising results in news classification.In another study, SVM is used for the news identification.Results are promising because of the classification precision obtained -90 % in differentiating satire from real news [3].
The conflicting viewpoints on social media are identified in news verification process [4].The accuracy of the method obtained is 84 %.Logistic regression and the harmonic Boolean crowdsourcing algorithm is used for automated FND in [5].Accuracy above 99 % is obtained from the harmonic Boolean label crowdsourcing algorithm.For detecting opinion spam and fake news, the authors use machine learning classification techniques and n-gram analysis [6].The success of their methods is 90 %.The classification algorithms are evaluated for FND problem in [7].The best performance is achieved with the Stochastic Gradient Descent with a value of 77.2 %.A hybrid algorithm called CSI is adapted for FND [8].In this work, recurrent/recursive neural networks are adopted to represent sequential posts and user engagements for FND.The accuracy of the CSI method is 89.2% on real-world data.A hybrid algorithm considering attention-based long-short memory network is used for solving the FND problem in [9].The performance of the method is tested on benchmark FND datasets.This method has outperformed the Yang's hybrid convolutional neural network model by 14.5 % in accuracy.Machine learning algorithms are used to identify fake news posted on Facebook automatically [10].End-to-end framework entitled Event Adversarial Neural Network is proposed to detect fake news in [11].The model is evaluated on two custom datasets.However, this model is compared with ad-hoc baselines, which are not considered for false news detection.Content-based FND method is proposed in [12].The accuracy rates of the method are 72 %, 61.3 %, and 70% for three different datasets.
Due to many advantages of metaheuristic optimization algorithms, they have been efficiently used in solving many complex real-world problems.They are population-based global search methods, which do not start searching having a single candidate solution.They do not need information about the characteristics of the search spaces and they are not problem dependent.Metaheuristic optimization algorithms are general-purpose solution search methods, which find global or near global optimum solution(s) within a reasonable computation time.As FND is one of the new complex real-world problems also, different more efficient methods need to be proposed for better performance with respect to different metrics.The metaheuristic algorithms may be efficiently used for solving these type of problems in order to obtain better performance than that of the existing approaches.The unstructured textual social media data can be considered as a search space and metaheuristic algorithm can be adapted as a search method for FND problem.
In this study, a novel model is proposed by adapting GWO and SSO algorithms to solving the FND problem.FND problem is considered as an optimization problem and GWO and SSO are modeled as search methods for the first time by specifying a proper representation scheme (encoding type) and objective function.
The main contributions of this paper are summarized as follows:  To the best of our knowledge, population-based metaheuristic optimization algorithms as a solution of search method for FND within social media are proposed for the first time in this paper;  New proper representation scheme (encoding types) for the FND problem with unstructured textual social media is proposed;  A flexible fitness function for efficiently and simultaneously handling many objectives for the FND problem is proposed for the first time.Different objectives can easily be integrated into the fitness function;  New application areas of metaheuristic optimization algorithms are introduced in this paper.The algorithms are shown to be efficiently used in special social media analysis problems for the first time.

A. Grey Wolf Optimization Algorithm
One of the members of the Canidae family is the Grey wolf.Grey wolves usually live in a pack of 5-12 animals.GWO algorithm is a stochastic metaheuristic optimization method proposed in [13].The main idea of this algorithm is influenced by the hunting principles and social relations of Grey wolves.
Similar to other population-based metaheuristic optimization algorithms, the optimization process begins with creating candidate solutions for the relevant problem.
The candidate solutions are divided into 4 groups, including alpha , beta (), delta , and omega ().In the first step of each iteration, the objective values of each solution are calculated.The first three of the best from wolves are selected and labeled as , (), and The rest of the grey wolves () are required to follow , (), and  in order to obtain better solutions.These stages are repeated until the finalization criteria are met.When the optimization process is finished, the alpha with the best fitness value returns as the final global solution point.
The pseudo code of the GWO algorithm is illustrated in Fig. 1.

B. Salp Swarm Optimization Algorithm
The Salps, which are individuals of the Salpidae family, have a transparent and barrel-shaped body.The Salp community is a population that lives in a colony called Salp Chain.The SSO algorithm, which is a population-based metaheuristic algorithm, imitates swarm movement and food search of Salps [14].The SSO algorithm starts with a set of n-dimensional random solutions to the optimization process.The fitness values of each candidate solution are calculated and, then, the salp with the best fitness is selected.The best salp`s positions are assigned to the F variable, which represents the source of food to be tracked by the salp chain.The positions of salps of leader and follower are updated for each dimension.During the optimization process, available food source changes and the salp with the best fitness finds the new source of food.
The pseudo code of the SSO method is illustrated in Fig. 2.

A. Data Preprocessing
In the text mining applications, the representation of textual data greatly influence the accuracy of the results.In the proposed model, textual social media data are converted into numerical data (Document Vector) by several preprocessing.In the Document Vector, each row represents the fake news documents and columns represent the features extracted from all documents.The basic steps of the preprocessing are given in Fig. 3.The fitness function is defined to construct a FND model with the best fitness value for fake and real data on the training data of each dataset.Determination of the fitness function is the most important part of the metaheuristic algorithms.In this study, a new fitness function is proposed for simultaneously handling the different objectives, such as accuracy, precision, recall, f-score, etc., in the FND system.The fitness values of the candidate solutions have been calculated according to (1) Fitness function a Accuracy a ecision a call a F score where variablesand  are random weights such that the sum of them is equal to 1.This function is flexible and does not consider only one objective.Different objectives can easily be integrated.In each iteration of the metaheuristic optimization algorithm, the candidate solution in the population is compared with each training data to find the most appropriate model that represents fake and real data.In this study, a similarity criterion is calculated between the fake/real model and training data by using the Jaccard similarity metric.Thereby, it is estimated, which data have a fake or a real label in all data.Accuracy, precision, recall, and f-score values are computed by comparing the estimated class label of the data with the actual class label.In order to construct the FND model with the best fitness value, the obtained accuracy, precision, recall, and f-score values are used in each iteration when calculating the fitness function.

C. Testing the Data with FND Model
In this stage of the study, FND model is used to verify the test data.Each test data is compared with both the real and fake models obtained from optimization algorithms using Jaccard similarity.The label of the model, which is more similar to the real or fake models, is assigned to the label of the data being tested.Finally, accuracy, precision, recall, and f-score values are calculated on the test data.The constructed model is tested on three different real-world fake news datasets separately.Obtained results are given in the experimental evaluations.

IV. EXPERIMENTAL EVALUATIONS
In this section, three different real-world fake news datasets are used to evaluate the proposed FND model.70 % of the dataset is used for training and 30 % of the total dataset is used for testing the algorithms within all datasets.PC used in the present work has Core i5-3230M CPU, 2.60 GHz processor, 8 GB RAM, and 500 GB HDD.

A. Results for BuzzFeed Political News
BuzzFeed Political News dataset is collected from the fake election news article [15] in BuzzFeed 2016 [16].
Before the 2016 US Presidential Election, BuzzFeed has reviewed real and fake stories during the nine months period to obtain these data.The number of features extracted from the BuzzFeed Political News dataset is 38.Accordingly, the dimension parameters of SSO and GWO algorithms are set to 38.
Table II shows the performance comparison for SSO, GWO, and supervised artificial intelligence algorithms on the BuzzFeed Political News dataset.Graphical representation of algorithm performances with respect to accuracy, precision, recall, and f-score metrics are demonstrated in Fig. 4. From Table II, it can be seen that the GWO algorithm performs the best accuracy of 0.875 for the dataset.In this dataset, the worst accuracy of 0.562 is achieved using Ridor.In terms of precision, the SSO algorithm has the highest precision among the nine algorithms while, again, Ridor has the lowest precision.On recall metric, SSO and GWO algorithms have the highest performance with a precision of 1.000.In terms of f-score, the highest value is achieved by the SSO algorithm (0.839) while the lowest value is achieved by Ridor (0.579).

B. Results for Random Political News
There is only political news in dataset 1.Therefore, Horne and Adali created their Random Political News dataset.They collected fake news from the list of fake news websites and real news from Business Insider's "Most Trusted" list [17].In the Random Political News dataset, the number of extracted features is 39.The dimension parameters of SSO and GWO algorithms are set to 39.
Table III gives the obtained results from SSO, GWO, and supervised artificial intelligence algorithms on the Random Political News dataset.Figure 5 demonstrates a graphical representation of a comparison of the results obtained by these algorithms.

C. Results for Liar Benchmark
The Liar Benchmark dataset is a publicly available dataset presented for FND [18].This dataset contains 12836 realworld short statements from POLITIFACT.COM.The number of features extracted from the Liar Benchmark dataset is 42.Accordingly, the dimension parameters of SSO and GWO algorithms are set to 42.
The obtained results from SSO, GWO, and supervised artificial intelligence algorithms for Liar Benchmark dataset are illustrated in Table IV.The performances of these algorithms for Liar Benchmark dataset are shown in Fig. 6.
The results show that GWO algorithm provides best results in terms of all evaluation metrics, except precision in this dataset owing to the proposed flexible fitness function that simultaneously and efficiently handles the different objectives.The highest precision value is obtained from the proposed SSO.The lowest accuracy and f-score is achieved by Naïve Bayes.In terms of precision, the lowest value is achieved by Ridor (0.822).
Due to the global search capability of metaheuristic algorithms using many candidate solutions rather than one point as in local search methods, there are outperformed supervised artificial intelligence algorithms for solving the FND problem.Furthermore, metaheuristic optimization algorithms are adapted as a fake news detector in this work, which use flexible objective function satisfying different objectives.They optimize all of the metrics simultaneously and that is why they outperform other methods in terms of many metrics that are integrated into flexible objective function.

V. CONCLUSIONS
The fake news in online social media needs to be detected to eliminate the disadvantages of social media.In this paper, two novel optimization-based approaches for FND problem on social networks are proposed.For this purpose, two of the newest metaheuristic optimization algorithms, namely GWO and SSO, have been adapted to solving the FND problem for the first time.The proposed approaches are evaluated within three different real-world datasets and the results are compared with seven supervised artificial intelligence algorithms.The best accuracy is obtained from GWO within all datasets.GWO has also given the best precision and f-score values in two out of three datasets.SSO outperforms all of the algorithms in terms of precision within two out of three datasets.Due to the representation scheme and flexible fitness function that simultaneously and efficiently handles many different objectives, the obtained results from the proposed two algorithms are very promising.
Another advantage of metaheuristic algorithm proposed for FND in this work is that they construct explainable model that consists of mined specific words for false or real news.However, most of the supervised artificial intelligence algorithms used for FND in this work are black-box based methods.The proposed approach has a flexible fitness function.That is why different objectives may be easily integrated into the model as well.This work can be regarded as a reference work in the social media analysis since it adapts optimization algorithms to solving the FND problem for the first time.Researchers on text mining, social network analysis, and optimization can use and enhance the methods proposed in this study for solving different types of social media problems for to get more efficient results due to the promising results obtained from GWO.
Different similarity metrics may be used for model construction and testing in order to improve performance.Binary versions of metaheuristic optimization methods may be used within the converted document vector as well.The GWO and SSO algorithms may be implemented with finetuned parameters in order to improve the performance.In parallel, distributed or different models may be proposed for metaheuristic algorithms in order to efficiently solve these types of problems.Adaptive and hybrid versions of the algorithms may also be proposed for improving the results.
The training data covered by the obtained model are extracted from the training dataset.The same steps are repeated until there is no uncovered data for the remaining training data in the training dataset.

Fig. 4 .
Fig. 4. Performances of the algorithms in BuzzFeed Political News dataset.

Fig. 5 .
Fig. 5. Performances of the algorithms in Random Political News dataset.

Fig. 6 .
Fig. 6.Performances of the algorithms in Liar Benchmark dataset.

TABLE I .
THE PARAMETERS OF GWO AND SSO ALGORITHM.

TABLE II .
THE PERFORMANCE OF THE ALGORITHMS FOR THE BUZZFEED POLITICAL NEWS DATASET.

TABLE III .
THE PERFORMANCE OF THE ALGORITHMS FOR THE RANDOM POLITICAL NEWS DATASET.Similar to the results obtained from the BuzzFeed Political News dataset, GWO algorithm provides the best results in terms of all evaluation metrics except recall values in this dataset due to the proposed efficient fitness function that does not consider only one objective.The lowest values for accuracy, precision, and f-score have been achieved by Decision Tree.

TABLE IV .
THE PERFORMANCE OF THE ALGORITHMS FOR THE LIAR BENCHMARK DATASET.