SPAM DETECTION ON SOCIAL MEDIA USING HYBRID ALGORITHM
Keywords:
Spam Detection, Logistic Regression, Random Forest, Ada Boost ClassifierAbstract
This work presents a comprehensive study on the detection of YouTube Spam comments. The study was aimed at investigating the impact of using machine learning algorithms on the accuracy of detecting spam comments. The study used a dataset of YouTube comments collected from Kaggle sources and underwent a pre-processing stage to ensure the data was in a format suitable for analysis. Three machine learning algorithms were used to build models for the classification of YouTube comments as spam or not, these algorithms include Logistic Regression, Random Forest, and Ada Boost Classifier. Additionally, a hybrid model was developed by combining the best-performing base models, Random Forest, Logistic Regression, and Ada Boost Classifier using a voting classifier.
The models were evaluated using three evaluation metrics: accuracy, precision, and recall. The results showed that the hybrid model outperformed the other models, achieving an accuracy of 97.8%, precision of 99.9% and recall of 95.9%. The study also compares the results with existing work and found that the proposed hybrid model achieved higher precision, accuracy, and recall. The study concludes that the use of a hybrid model is a suitable solution for detecting YouTube spam comments and provides promising results.
However, the study also acknowledges that there are limitations in the study, including the limited size of the dataset and the limitations of machine learning algorithms used. Future work includes exploring alternative machine learning algorithms and increasing the size of the dataset to further improve an accuracy of the spam detection. Overall, the results of this study have implications for the development of more robust and the effective spam detection systems for YouTube comments.
References
Sharmin, Sadia, and Zakia Zaman. "Spam detection in social media employing machine learning tool for text mining." 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS). IEEE, 2017.
Alberto, Túlio C., Johannes V. Lochter, and Tiago A. Almeida. "Tubespam: Comment spam filtering on youtube." 2015 IEEE 14th international conference on machine learning and applications (ICMLA). IEEE, 2015.
Alias, Nabilah, et al. "Video spam comment features selection using machine learning techniques." Indones. J. Electr. Eng. Comput. Sci 15.2 (2019): 1046-1053.
Liu, Chen, and Genying Wang. "Analysis and detection of spam accounts in social networks." 2016 2nd IEEE International Conference on Computer and Communications (ICCC). IEEE, 2016.
Trivedi, Shrawan Kumar. "A study of machine learning classifiers for spam detection." 2016 4th international symposium on computational and business intelligence (ISCBI). IEEE, 2016.
Shirani-Mehr, Houshmand. "SMS spam detection using machine learning approach." unpublished) http://cs229. stanford. edu/proj2013/Shir aniMeh r-SMSSpamDetectionUsingMachineLearningApproach. pdf (2013).
Sun, Nan, et al. "Near real-time twitter spam detection with machine learning techniques." International Journal of Computers and Applications 44.4 (2022): 338-348.
Ahmed, Naeem, et al. "Machine learning techniques for spam detection in email and IoT platforms: analysis and research challenges." Security and Communication Networks 2022 (2022).
GuangJun, Luo, et al. "Spam detection approach for secure mobile message communication using machine learning algorithms." Security and Communication Networks 2020 (2020).
Kumar, Nikhil, and Sanket Sonowal. "Email spam detection using machine learning algorithms." 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA). IEEE, 2020
Wu, Tingmin, et al. "Twitter spam detection based on deep learning." Proceedings of the australasian computer science week multiconference. 2017.
Kontsewaya, Yuliya, Evgeniy Antonov, and Alexey Artamonov. "Evaluating the effectiveness of machine learning methods for spam detection." Procedia Computer Science 190 (2021): 479-486.
Downloads
Published
How to Cite
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Re-users must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. This license allows for redistribution, commercial and non-commercial, as long as the original work is properly credited.