IWD Based Feature Selection Algorithm for Sentiment Analysis

Feature selection methods aim to improve the classification performance by eliminating non-valuable features. In this paper, our aim is to apply a recent optimization technique namely the Intelligent Water Drops (IWD) algorithm to select best features for sentiment analysis. We investigate the classification performances of our proposed IWD based feature selection method by comparing one of the well-known feature selection method using Maximum Entropy classifier. Experimental results show that Intelligent Water Drops based feature selection method outperforms than ReliefF method for sentiment analysis. DOI:  10.5755/j01.eie.25.1.22736


I. INTRODUCTION
Today internet and social media have become an important source of information.The increasing amount of information on the internet has brought new research areas.Opinions are shared by a lot of people through different social media platforms such as blogs, forums, twitter.With the increasing importance of sentiment information, there is a need for fast and effective analysis techniques that are not only subject-oriented but also sentiment focused [1].However, the rapidly increasing amount of online documentation has made it difficult and time-consuming to analyze desired information.Sentiment analysis has become an important research area for automatic analysis of review documents.Sentiment analysis aims to classify the sentiments of these review documents especially into two classes: positive or negative.In order to increase the performance of the classification process, feature selection methods are applied to determine the most valuable features.Feature selection is very important in two respects: the efficiency of the training process increases significantly by reducing the number of features; secondly, the accuracy of classification increases by choosing most valuable features.
In this study, our aim is to investigate the effects of our proposed IWD based feature selection algorithm on classification performance for sentiment analysis.For this purpose, we compared our IWD based feature selection method with a well-known feature selection method called II.RELATED WORK Sentiment analysis can be defined as a text classification process based on the classification of review documents as generally positive or negative.Machine learning methods [2] or lexicon based methods [3] have been widely used for sentiment classification.However, machine learning methods have been preferred by many researchers because of their ease of use and computational performance [2]- [4].Naive Bayes (NB), Support Vector Machines (SVM) and Maximum Entropy (ME) classifiers have been applied with success to sentiment classification problems.
Feature selection methods rank features so that noninformative features can be eliminated in order to improve the classification accuracy and efficiency.Researchers have used various feature selection methods to distinguish features that contain more information [5]- [7].Agarwal and Mittal [7] employ information gain (IG) and minimum redundancy maximum relevancy (mRMR) feature selection methods.They show that mRMR method obtains better classification performance using boolean multinomial NB classifier.Abbasi,Chen,and Salem [5] compare information gain method with an intuitive method namely Entropy Weighted Genetic Algorithm (EWGA) and EWGA achieves better accuracy using SVM classifier.Stylios,Katsis,and Christodoulakis [8] compare two heuristic method namely Ant Colony Optimization (ACO) and Particle Swarm Optimization for feature selection and they improve the accuracy from 83.7 % to 90.6 %.Kaur,Sehra,and Sehra [9] investigate the classification performance of a hybrid method that combines SVM and ACO, and they achieve to improve the accuracy from 75.5 % to 86.6 %.Some researchers propose a modified IWD algorithm with alternative selection methods namely linear ranking and exponential ranking [10].propose an IWD based optimization method to design and configure a supply chain and logistics network taking into account multiple objectives simultaneously.In another study [12], researchers apply an algorithm with IWD to select nodes for service-oriented wireless networks.Moreover, Alexandros and Georgios [13] underline the importance of nature inspired optimization algorithms and present recent developments and possibilities of using for different problem areas of these algorithms in terms of theory and application.

III. METHODS AND DATASET
In this study, we implemented an IWD based feature selection method and compared its effects on classification performance for sentiment analysis with ReliefF feature selection method.While ReliefF is a statistical based method, IWD based method is a heuristic method.

A. ReliefF
Relief algorithm is proposed by Kira and Rendell [14] as a simple, fast, and effective approach to feature weighting.Kononenko [15]

Algorithm ReliefF
Input: for each training instance, a vector of feature values and the class value 1. initialize vector W 2. for i= 1 to m do 3.
randomly select a target instance Ri; 4.
find a nearest hits H and nearest miss M.

5.
for A= 1 to a do 6.When performing weight updates, the difference in value of attribute A between two instances I1 and I2, where I1 = Ri and I2 is either H or M is calculated by diff function in Fig. 1. [16].diff function is defined as follows for discrete features ,  ,, 1, .

if value A I value A I diff A I I otherwise
For continuous features, diff function is defined as follows The max(A) and min(A) values determined over the whole set of instances.By this normalization, all weight updates fall between 0 and 1 for all type of features.When updating W[A], to normalize final weights between -1 and 1, output of diff function is divided by m.

B. Intelligent Water Drops
IWD based feature selection algorithm [7] is implemented to select best features to provide an accurate and fast classification.The IWD algorithm constructs an optimal solution through cooperation among a group of agents called water drops.The algorithm imitates the phenomena of a swarm of water drops flowing with soil along a river bed.Procedurally, each water drop incrementally constructs a solution through a series of iterative transitions from one node to the next until a complete solution is obtained.Water drops communicate with each other through an attribute called soil, which is associated with the path between any two points.The soil value is used to determine the direction of movement from the current node to the next, whereby a path with a lower amount of soil is likely to be followed.choose n features by using roulette wheel that is formed according to the probabilities of features that are computed with respect to soil values of features.5.
classify documents by using selected n features and compute the F-score.6.
by using the computed F-score value, update the soil and velocity values of the IWD, and the soil value of all features in the feature space.7. end 8. choose the best feature set found so far.9. until the termination condition is met.10. return the best feature set.First, we construct a graph from our features to use IWD algorithms and then we set the initial soil (initSoil) of features, initial velocity of water drops (initVel), number of water drops (NIWD), soil updating parameters (as, bs, cs, ρ), and velocity updating parameters (av, bv, cv).After initialization the parameters, for each IWD, we choose n (distinct) features (f).We compute the probability P(i) according to (3).At the beginning, water drops are spread randomly at the nodes of the construction graph, and visited nodes list (F) is updated to include the start node where P(i) represents the probability of selecting node i, f(soil(i)) is the fitness function.The fitness function of candidate node i is inversely proportional to the absolute soil value (4) where s  is a small positive number used to prevent the division by zero in function f (soil(i)) where soil(i) refers to the amount of soil within the node (feature) i and soil(k) refers to the amount of soil within the node k which is not selected before.Then, add the newly selected node i to the list F. The min() function returns the minimum value of its arguments.After selecting n features according to (3), the training dataset is classified by using the selected features and the F-score value which is a real number in range [0, 1], of the classification is taken to measure the "quality" of the selected features.If the computed F-score is high, this means that the selected features are valuable.Then, F-score value is used to update the velocity and the soil values.For each IWD, the velocity is updated as follows where av, bv, cv are the static parameters used to represent the nonlinear relationship between the velocity of water drop , i.e. vel IWD , and the inverse of the amount of soil in the local node, i.e. soil(i).The amount of soil that the IWD loads from the selected features is where here F refers to the selected subset of features and Fscore(F) refers to the F-score value of the selected features.The IWD's soil, soil IWD , is increased by removing some soil of the selected features F. Update the soil of the IWD as  .
The soil values of all features are updated as follows , .
The above computations are repeated for each IWD and the best feature set is recorded.All these processes continue until the termination condition is met.We chose 250 iterations as the termination condition.According to the previous studies [18], [19] no improvement observed after 250 iterations for swarm based algorithms.

C. Sentiment Classification
Many

D. Dataset
We conduct experiments on the Turkish Twitter dataset [22] belonging to a private telecommunication company.The dataset contains 3000 tweets that consists of three different classes (positive, negative, neutral).We use only positive and negative tweets.We tokenize alphabetic characters as features and calculate their weights using term frequency.Five-fold cross validation is applied to evaluate the performance of classification.

E. Evaluation Metrics
The classification performance of experiments is evaluated using F-score.F-score is based on precision and recall measurement terms.Precision (P) is the percentage of correctly classified documents among all documents that classified to a class.Recall (R) is the percentage of documents that classified to that class.F-score is defined as the harmonic mean of recall and precision [23] 2.

IV. EXPERIMENTS AND RESULTS
In this study, Turkish Twitter data is used and we investigate the effects of our proposed IWD based feature selection method by comparing traditional feature selection method for sentiment analysis.In a previous study on cyberbullying which is a special category of sentiment analysis, using emoticons as features is not effected classification performance well [24].Therefore, we extract only alphabetic characters as features and calculate their weights using term frequency.We do not apply stemming because of Turkish is an agglutinative language and the suffixes include the polarity of a word [25].We developed a software using Python Natural Language Toolkit-NLTK [26] for our experiments.We run Maximum Entropy classifier using five-fold cross validation.In the first stage of our study, we establish the baseline for examining the impacts of feature selection methods using all features that are obtained bag-of-words method.We use RelifF algorithm from Weka data mining software package.In the second stage, the features are ranked by each feature selection method.Our aim is to select the most valuable features.We chose top ranked 10, 50, 100, 250, and 500 number of features for measuring classification performances.
IWD parameters are initialized according to This process is repeated until the best features are selected.
After selecting the best features, classification of the test dataset is made by using the selected feature set.We obtain our baseline as 0.691 by using all features in the training set (without feature selection) which can be observed in Table I.There are many non-informative features in the feature set and these non-informative features negatively affect the classification performance.Selecting feature subsets eliminates non-informative features and improves the classification performance.When we compare the results presented in Table I, we can say that classification performances increase in terms of F-score when we use features selected by our proposed IWD based feature selection method.For example, the F-score for the Twitter dataset is increased from 0.691 to 0.721 using IWD based feature selection method.Our proposed method achieved best performances with all feature sizes than ReliefF filter based method using Maximum Entropy classifier.In Table II, time required to classify test dataset for different feature sizes with ME classifier are displayed.As can be easily seen from the Fig. 3, using feature selection methods reduces the time required to classify new (unseen) data sharply without making reduction in classification accuracy.V. CONCLUSIONS In this study, we have developed an IWD based feature selection system for the sentiment classification.We run our experiments using the Turkish Twitter dataset against ME classifier.Experimental evaluation shows that our IWD based feature selection method is able to select better features with respect to well-known ReliefF filter based feature selection method.The classification performance has been increased significantly over the baseline result that is shown in Table I.The F-score of the Twitter dataset is increased from 0.691 to 0.721 using IWD based feature selection method.The proposed method is effective in reducing the number of features so that it is suitable for classification of high dimensional data.By reducing the feature space, our system also reduces the time required to classify test dataset sharply without loss of accuracy in classification.
extends Relief algorithm and proposes the ReliefF algorithm for multi class problems.ReliefF estimates W[A] of the quality of attribute A according the equation in line 8-9 in Fig. 1 [16].In Fig. 1, n indicates the number of training instances, a indicates the number of features and m indicates the number of random training instances out of n used to update W. We use RelifF algorithm from Weka data mining software package [17].

Fig. 2 .
Pseudo code of IWD based feature selection algorithm.
[27]  as initSoil = 1000; initVel = 100; av = 1000; bv = 0.01; cv = 1; as = 1000; bs = 0.01; cs = 1; ρ = 0.01.Number of iterations and number of IWDs are determined according to previous studies[18],[19]  as NIWD = 30 and number of iterations = 250.Our IWD based feature selection algorithm selects a predefined number of features for each water drop.Then we evaluate the performance of the selection made by each water drop by using the ME classifier.According to the performance of the selected features, we update soil and velocity values and selection probabilities of the features.

Fig. 3 .
Fig. 3. Results of the feature selection methods with reduced feature sizes.
The algorithm of IWD based feature selection are described in pseudo code given below: researchers widely use machine learning methods because they can be easily adapted.Sentiment classification is a text classification task to sort the into negative and positive classes [20].Maximum Entropy is also known as logistic regression in statistics [21].Maximum Entropy is more robust to correlated features.If there are many correlated features, ME assigns more accurate probability.We use Logistic Regression classifier from Weka data mining software package [17].

TABLE I .
RESULTS OF THE FEATURE SELECTION METHODS WITH REDUCED FEATURE SIZES.

TABLE II .
EFFECTS OF FEATURE SELECTION ON TIME NEEDED FOR CLASSIFICATION.