Multi-Label Opinion Mining Based on Random Forest with SMOTE and ADASYN

Ricy Ardiansyah, Herman Yuliansyah, Anton Yudhana

Submitted : 2025-07-15, Published : 2025-11-30.

Abstract

Multi-label classification is essential to categorize data into multiple labels simultaneously. However, data imbalance poses a challenge, where some labels have much less representation, thus reducing the model performance. This study aims to propose a candidate-based sentiment analysis model on the 2024 Jakarta Presidential and Gubernatorial Election review. The SMOTE and ADASYN oversampling methods are applied to handle class imbalance. Both oversampling methods are compared with the Random Forest machine learning method. The experimental results show that. The experimental results show that in the classification of Presidential candidates, Random Forest achieves an accuracy of 0.947 with SMOTE and 0.948 with ADASYN. For sentiment labels, the accuracy of Random Forest remains high with a result of 0.989 for both SMOTE and ADASYN. In the classification of Jakarta Gubernatorial candidates, Random Forest + SMOTE produces an accuracy of 0.975, while with ADASYN it decreases slightly to 0.973. For sentiment labels, both SMOTE and ADASYN have the highest accuracy of 0.993. The application of SMOTE and ADASYN helps to improve the distribution of the minority class without decreasing the overall accuracy, as well as improving the stability in recognizing various multi-label classes in a balanced manner.

Keywords

Aspect Based Sentimen Analysis; Imbalance Data; Random Forest; Candidate Selection; Random Oversampling

References

P. Rita, N. António, dan A. Patrícia, “Social Media Discourse and Voting Decisions Influence : Sentiment Analysis in Tweets During an Electoral Period,” Soc. Netw. Anal. Min., vol. 13, no. 1, hal. 1–16, 2023, doi: 10.1007/s13278-023-01048-1.

A. Patel, P. Oza, dan S. Agrawal, “Sentiment Analysis of Customer Feedback and Reviews for Airline Services Using Language Representation Model,” Procedia Comput. Sci., vol. 218, hal. 2459–2467, 2023, doi: 10.1016/j.procs.2023.01.221.

A. Nahid, D. Pramesti, A. Fathurahman, dan H. Fakhrurroja, “Exploring Sentiment Analysis for the Indonesian Presidential Election Through Online Reviews Using Multi-Label Classification with a Deep Learning Algorithm,” Multidiscip. Digit. Publ. Inst. Inf., vol. 15, hal. 1–33, 2024, doi: https://doi.org/10.3390/info15110705.

G. Manias, A. Mavrogiorgou, A. Kiourtis, C. Symvoulidis, dan D. Kyriazis, “Multilingual Text Categorization and Sentiment Analysis : A Comparative Analysis of The Utilization of Multilingual Approaches for Classifying Twitter Data,” Neural Comput. Appl., vol. 35, no. 29, hal. 21415–21431, 2023, doi: 10.1007/s00521-023-08629-3.

A. Ameur, S. BenHamdi, dan S. Ben Yahia, “Multi-Label Learning for Aspect Category Detection of Arabic Hotel Reviews Using AraBERT,” Sci. Technol. Publ., vol. 2, hal. 241–250, 2023, doi: 10.5220/0011694800003393.

Erlin, Y. Desnelita, N. Nasution, L. Suryati, dan F. Zoromi, “Dampak SMOTE terhadap Kinerja Random Forest Classifier berdasarkan Data Tidak Seimbang,” J. Manajemen, Tek. Inform. dan Rekayasa Komput., vol. 21, no. 3, hal. 677–690, 2022, doi: 10.30812/matrik.v21i3.1726.

A. A. Firdaus, A. Yudhana, dan I. Riadi, “Public opinion analysis of presidential candidate using naïve bayes method,” Game Technol. Inf. Syst. Comput. Network, Comput. Electron. Control, vol. 4, no. 2, hal. 563–570, 2023, [Daring]. Tersedia pada: https://doi.org/10.22219/kinetik.v8i2.1686

I. A. Rahma dan L. H. Suadaa, “Penerapan Text Augmentation Untuk Mengatasi Data Yang Tidak Seimbang Pada Klasifikasi Teks Berbahasa Indonesia Studi Kasus: Deteksi Judul Clickbait Dan Komentar Hate Speech Pada Berita Online,” J. Teknol. Inf. dan Ilmu Komput., vol. 10, no. 6, hal. 1329–1340, 2023, doi: 10.25126/jtiik.2023107325.

A. Azzawagama, A. Yudhana, dan I. Riadi, “Indonesian presidential election sentiment : Dataset of response public before 2024,” Data Br., vol. 52, hal. 109993, 2024, doi: 10.1016/j.dib.2023.109993.

A. Basuki, “Sentiment Analysis of Customers ’ Review on Delivery Service Provider on Twitter Using Naive Bayes Classification,” J. Ilm. Tek. Elektro Komput. dan Inform., vol. 9, no. 2, hal. 420–428, 2023, doi: 10.26555/jiteki.v9i2.26327.

B. Erkantarci dan G. Bakal, “An Empirical Study of Sentiment Analysis Utilizing Machine Learning and Deep Learning Algorithms,” J. Comput. Soc. Sci., vol. 7, no. 1, hal. 241–257, 2024, doi: 10.1007/s42001-023-00236-5.

A. A. Firdaus, A. Yudhana, dan R. Imam, “Analisis Sentimen Pada Proyeksi Pemilihan Presiden 2024 Menggunakan Metode Support Vector Machine,” J. Pendidik. Teknol. Inf., vol. 3, no. 2, hal. 236–245, 2024, [Daring]. Tersedia pada: http://dx.doi.org/10.51454/decode.v3i2.172

M. M. Adam, N. M. Sabri, U. M. Fatihah, N. Hrishvanthika, dan N. Isa, “Sentiment Analysis on Acceptance of COVID-19 Vaccine for Children based on Support Vector Machine,” J. Adv. Res. Appl. Sci. Eng. Technol., vol. 2, no. 2, hal. 252–270, 2026, [Daring]. Tersedia pada: https://doi.org/10.37934/araset.58.2.252270

Y. Qi dan Z. Shabrina, “Sentiment Analysis Using Twitter Data : A Comparative Application of Lexicon ‑ and Machine ‑ Learning ‑ Based Approach,” Soc. Netw. Anal. Min., vol. 13, no. 1, hal. 1–14, 2023, doi: 10.1007/s13278-023-01030-x.

M. Verma et al., “People ’ s Perceptions on COVID ‑ 19 Vaccination : an Analysis of Twitter Discourse from Four Countries,” Sci. Rep., hal. 1–11, 2023, doi: 10.1038/s41598-023-41478-7.

N. C. Anggista Oktavia Praneswara, “Analisis Sentimen Ulasan Aplikasi TikTok Shop Seller Center di Google Playstore Menggunakan Algoritma Naive Bayes,” Indones. J. Comput. Sci., vol. 12, no. 1, hal. 3925–3940, 2023.

Yuyun, N. Hidayah, dan S. Supriadi, “Algoritma Multinomial Naïve Bayes Untuk Klasifikasi Sentimen Pemerintah Terhadap Penanganan Covid-19 Menggunakan Data Twitter,” Rekayasa Sist. dan Teknol. Inf., vol. 5, no. 10, hal. 820–826, 2021.

R. Saputra dan M. G. Pradana, “Implementasi Algoritma Cosine Similarity dan TF- IDF dalam Menentukan Rumpun Jabatan,” J. Tek. Inform., vol. 12, no. 1, hal. 1–11, 2024, doi: 10.32832/kreatif.v12i1.15470.

V. Pranatawijaya, N. Noor, A. Resha, E. Christian, dan G. Septian, “Unveiling User Sentiment : Aspect-Based Analysis and Topic Modeling of Ride-Hailing and Google Play App Reviews,” J. Inf. Syst. Eng. Bus. Intell., vol. 10, no. 3, hal. 328–339, 2024, doi: http://dx.doi.org/10.20473/jisebi.10.3.328-339.

M. Zakariah, S. A. Alqahtani, dan M. S. Al-Rakhami, “Machine Learning-Based Adaptive Synthetic Sampling Technique for Intrusion Detection,” Multidiscip. Digit. Publ. Inst. Appl. Sci., vol. 13, hal. 2–31, 2023, [Daring]. Tersedia pada: https://doi.org/10.3390/app13116504 [21] V. W. Lumumba dan D. Kiprotich, “Comparative Analysis of Cross-Validation Techniques : LOOCV , K-folds Cross-Validation , and Repeated K-folds Cross-Validation in Machine Learning Models,” Am. J. Theor. Appl. Stat., vol. 13, no. 5, hal. 127–137, 2024, doi: 10.11648/j.ajtas.20241305.13.

Y. Cathy, D. Paul, W. Jörg, dan T. Katerina, A systematic review of aspect ‑ based sentiment analysis : domains , methods , and trends, vol. 57, no. 11. Springer Netherlands, 2024. doi: 10.1007/s10462-024-10906-z.

E. Xu, J. Zhu, L. Zhang, Y. Wang, dan W. Lin, “Research on Aspect-Level Sentiment Analysis Based on Adversarial Training and Dependency Parsing,” Multidiscip. Digit. Publ. Inst., hal. 1–16, 2024, [Daring]. Tersedia pada: https://doi.org/10.3390/electronics13101993

T. M. Lange, M. Gültas, A. O. Schmitt, dan F. Heinrich, “optRF : Optimising random forest stability by determining the optimal number of trees,” BioMed Cent. Bioinforma. J., hal. 1–21, 2025, doi: 10.1186/s12859-025-06097-1.

J. Ding, J. Du, H. Wang, dan S. Xiao, “OPEN A novel two-stage feature selection method based on random forest and improved genetic algorithm for enhancing classification in machine learning,” Sci. Rep., hal. 1–16, 2025, [Daring]. Tersedia pada: https://doi.org/10.1038/s41598-025-01761-1

Article Metrics

Abstract view: 29 times
Download     : 5   times Download     : 1   times

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Refbacks

  • There are currently no refbacks.