Enhancing Sentiment and Emotion Classification with LSTM-Based Semi-Supervised Learning
Submitted : 2025-05-04, Published : 2025-06-13.
Abstract
The evolution of sentiment analysis has increasingly relied on semi-supervised learning (SSL) models, particularly due to their efficiency in utilizing large amounts of unlabeled data. This study employed four Indonesian datasets—Ridife (sentiment classification), Emotion Indonlu (emotion classification), Sentiment Indonlu (sentiment classification), and Hate Speech (offensive content detection). The LSTM model was trained using labeled data and used to generate pseudo-labels for unlabeled data across three iterations. The performance of the pseudo-labels was evaluated using Random Forest, Logistic Regression, and Support Vector Machine (SVM). The LSTM model demonstrated varying effectiveness across different datasets. For the Sentiment Ridife dataset, LSTM achieved an accuracy of 70.23%, slightly lower than Random Forest but higher than Logistic Regression and SVM. In the Sentiment IndoNLU dataset, LSTM's accuracy was 86.12%, showing strong performance but slightly below Random Forest and Logistic Regression. The Emotion IndoNLU dataset revealed similar performance across models, while the Hate Speech dataset saw LSTM perform well with an accuracy of 86.49%. The results indicate that while LSTM-based SSL can effectively generate pseudo-labels and enhance model performance, its performance varies depending on the dataset and task. This study underscores the need for further research into optimizing pseudo-labeling techniques and exploring advanced NLP models to improve sentiment and emotion analysis in diverse languages.
Keywords
References
V. L. Shan Lee, K. H. Gan, T. P. Tan, and R. Abdullah, “Semi-supervised learning for sentiment classification using small number of labeled data,” *Procedia Computer Science*, vol. 161, pp. 577–584, 2019. [Online]. Available: https://doi.org/10.1016/j.procs.2019.11.159
P. Sudhir and V. D. Suresh, “Comparative study of various approaches, applications and classifiers for sentiment analysis,” *Global Transitions Proceedings*, vol. 2, no. 2, pp. 205–211, 2021. [Online]. Available: https://doi.org/10.1016/j.gltp.2021.08.004
A. S. Aribowo, H. Basiron, and N. F. A. Yusof, “Semi-supervised learning for sentiment classification with ensemble multi-classifier approach,” *International Journal of Advances in Intelligent Informatics*, vol. 8, no. 3, pp. 349–361, 2022.
T. N. Fatyanosa and F. A. Bachtiar, “Classification method comparison on Indonesian social media sentiment analysis,” in *Proc. 2017 Int. Conf. Sustainable Information Engineering and Technology (SIET)*, 2018, pp. 310–315.
Y. Li, Y. Lv, S. Wang, J. Liang, J. Li, and X. Li, “Cooperative hybrid semi-supervised learning for text sentiment classification,” *Symmetry*, vol. 11, no. 2, pp. 1–17, 2019.
D. A. K. Khotimah and R. Sarno, “Sentiment analysis of hotel aspect using probabilistic latent semantic analysis, word embedding and LSTM,” *International Journal of Intelligent Engineering and Systems*, vol. 12, no. 4, pp. 275–290, 2019.
I. Guellil, F. Azouaou, and F. Chiclana, “ArAutoSenti: automatic annotation and new tendencies for sentiment classification of Arabic messages,” *Social Network Analysis and Mining*, vol. 10, no. 1, 2020. [Online]. Available: https://doi.org/10.1007/s13278-020-00688-x
A. Al-Laith, M. Shahbaz, H. F. Alaskar, and A. Rehmat, “AraSenCorpus: A semi-supervised approach for sentiment annotation of a large Arabic text corpus,” *Applied Sciences (Switzerland)*, vol. 11, no. 5, pp. 1–19, 2021.
Y. Fauziah, S. Saifullah, and A. S. Aribowo, “Design text mining for anxiety detection using machine learning based-on social media data during COVID-19 pandemic,” *Proc. LPPM UPN “Veteran” Yogyakarta Conf. Series 2020 – Engineering and Science Series*, vol. 1, no. 1, pp. 253–261, 2020.
C. R. Aydin and T. Güngör, “Sentiment analysis in Turkish: Supervised, semi-supervised, and unsupervised techniques,” *Natural Language Engineering*, vol. 27, no. 4, pp. 455–483, 2021.
W. Maharani, “Sentiment analysis during Jakarta flood for emergency responses and situational awareness in disaster management using BERT,” in *Proc. 2020 8th Int. Conf. Information and Communication Technology (ICoICT)*, 2020.
S. Khomsah, N. H. Cahyana, and A. S. Aribowo, “Hyperparameter tuning of semi-supervised learning for Indonesian text annotation,” *International Journal of Advanced Computer Science and Applications*, vol. 14, no. 9, pp. 250–256, 2023.
H. Jayadianti, W. Kaswidjanti, A. Tri, and S. Saifullah, “Sentiment analysis of Indonesian reviews using fine-tuning IndoBERT and R-CNN,” *ILKOM Jurnal Ilmiah*, vol. 14, no. 3, pp. 348–354, 2022.
H. Ahmadian, T. F. Abidin, H. Riza, and K. Muchtar, “Hybrid models for emotion classification and sentiment analysis in Indonesian language,” *Applied Computational Intelligence and Soft Computing*, vol. 2024, 2024.
M. O. Ibrohim and I. Budi, “Multi-label hate speech and abusive language detection in Indonesian Twitter,” in *Proc. Third Workshop on Abusive Language Online*, 2019, pp. 46–57.
A. S. Aribowo, H. Basiron, N. S. Herman, and S. Khomsah, “An evaluation of preprocessing steps and tree-based ensemble machine learning for analysing sentiment on Indonesian YouTube comments,” *International Journal of Advanced Trends in Computer Science and Engineering*, vol. 9, no. 5, pp. 7078–7086, 2020. [Online]. Available: ttps://www.scopus.com/inward/record.uri?eid=2-s2.085092659939&doi=10.30534%2Fijatcse%2F2020%2F29952020&partnerID=40&md5=92529b 57f447b0e2b2c06d43c90bbdc7
S. Khomsah and A. S. Aribowo, “Model semi-supervised learning menggunakan logistic regression untuk anotasi sentimen,” *Open Access Ledger*, vol. 1, no. 4, pp. 171–178, 2022.
W. Wijiyanto, A. I. Pradana, S. Sopingi, and V. Atina, “Teknik K-fold cross validation untuk mengevaluasi kinerja mahasiswa,” *Jurnal Algoritma*, vol. 21, no. 1, pp. 239–248, 2024.
Article Metrics



This work is licensed under a Creative Commons Attribution 4.0 International License.
Refbacks
- There are currently no refbacks.