Hybrid Technique for Detecting Extremism in Arabic Social Media Texts
Keywords:Accuracy approximation, Corpus, Extremism, Lexicon, Lower approximation, Rough set theory
Today, social media sites like Twitter provide effective platforms to share opinions and thoughts in public with millions of other users. These opinions shared on such sites influence a large number of people who may easily retweet them and accelerate their spread. Unfortunately, some of these opinions were expressed by extremists who promoted hateful content. Since Arabic is one of the most spoken languages, it is crucial to automate the process of monitoring Arabic content published on social sites. Therefore, this study aims to propose a hybrid technique to detect extremism in Arabic social media texts and articles to monitor the situation of published extremist content. The proposed technique combines the lexicon-based approach with the rough set theory approach. The rough set theory is employed with two approximation strategies: lower approximation and accuracy approximation. The hybrid technique used the rough set theory as a classifier and the lexicon-based as a vector. Furthermore, this study built three types of corpuses (V1, V2, and V3) collected from Twitter. The experimental findings show that among the proposed hybrid methods, the accuracy approximation was superior to the lower approximation with seed vector. It was also revealed that hybrid methods outperformed machine learning techniques in terms of efficiency. Moreover, the study recommends using an accuracy approximation method with seed vector to identify the polarity of the text.
How to Cite
The copyright for the paper in this journal is retained by the author(s) with the first publication right granted to the journal. The authors agree to the Creative Commons Attribution 4.0 (CC BY 4.0) agreement under which the paper in the Journal is licensed.
By virtue of their appearance in this open access journal, papers are free to use with proper attribution in educational and other non-commercial settings with an acknowledgement of the initial publication in the journal.