Enhancing Aspect-Based Sentiment Analysis in Imbalanced Multilabel Datasets using Resampling and Classifiers for Digital Signature Applications

Efriza Cahya Narendra, Amalia Anjani Arifiyanti, Tri Luhur Indayanti Sugata

Submitted : 2025-05-25, Published : 2025-06-23.

Abstract

Amid the growing demand for digital identity solutions, applications like Privy, VIDA, and Xignature offer integrated digital signature and e-stamp services, generating extensive user feedback on platforms like Google Play Store and App Store. Extracting meaningful insights from thousands of reviews is challenging, necessitating effective sentiment analysis. Aspect-Based Sentiment Analysis (ABSA) enables detailed sentiment evaluation by linking user feedback to specific aspects and sentiments. However, ABSA faces challenges with imbalanced datasets where label distributions are uneven. This study explores the application of three resampling techniques, including MLROS, MLSMOTE, and REMEDIAL, to address this issue in multilabel classification. Using multilabel classifiers, including Binary Relevance, Label Powerset, and Classifier Chains, the study systematically evaluates their performance. Results reveal that resampling significantly enhances outcomes, with MLROS and Classifier Chains under a 70:30 split achieving the best performance, reducing Hamming Loss to 0.0401 or 95% accuracy. This marks a 34.2% improvement over baseline models without resampling or classifiers. The model generalizes well to unseen data with minimal overfitting, as indicated by validation results. These results underscore the importance of imbalanced data resampling and multilabel classification techniques in advancing ABSA, offering valuable insights for improving sentiment analysis in real-world applications.

Keywords

Aspect-based sentiment analysis; multilabel classification; imbalanced data resampling; digital signature; e-stamp

Full Text:

PDF

References

D. K. Bogor, “Implementasi E-Surat dan Tandatangan Digital di Pemerintah Kota Bogor [Implementation of E-Letters and Digital Signatures in Bogor City Government],” [Online]. Accessed: Oct. 06, 2024. [Online]. Available: https://kominfo.kotabogor.go.id/index.php/post/single/603

BBC NEWS, “BBC NEWS Indonesia,” Online. Accessed: Oct. 06, 2024. [Online]. Available: https://www.bbc.com/indonesia/articles/c6234qw4wzgo

G. Radiena and A. Nugroho, “Analisis Sentimen Berbasis Aspek Pada Ulasan Aplikasi Kai Access Menggunakan Metode Support Vector Machine [Aspect-Based Sentiment Analysis on Kai Access App Reviews Using Support Vector Machine],” Jukanti, vol. 6, no. 1, pp. 1 – 10, 2023. https://doi.org/10.37792/jukanti.v6i1.836 (In Indonesian)

F. Zamachsari, Gabriel Vangeran Saragih, Susafa’ati, and Windu Gata, “Analysis of Sentiment of Moving a National Capital with Feature Selection Naive Bayes Algorithm and Support Vector Machine,” Jurnal RESTI, vol. 4, no. 3, pp. 504–512, Jun. 2020. https://doi.org/10.29207/resti.v4i3.1942

R. Wahyudi and G. Kusumawardana, “Analisis Sentimen pada Aplikasi Grab di Google Play Store Menggunakan Support Vector Machine [Sentiment Analysis on Grab App in Google Play Store Using SVM],” Jurnal Informatika, vol. 8, no. 2, pp. 200–207, Sep. 2021. https://doi.org/10.31294/ji.v8i2.9681 (In Indonesian)

A. N. Tarekegn, M. Giacobini, and K. Michalak, “A review of methods for imbalanced multi-label classification,” Pattern Recognit., vol. 122, 2022. https://doi.org/10.1016/j.patcog.2021.107965

P. Das, J. W. Sangma, V. Pal, and Yogita, “Predicting Adverse Drug Reactions from Drug Functions by Binary Relevance Multi-label Classification and MLSMOTE,” in Machine Learning, Image Processing, Network Security and Data Sciences, Springer, 2022, pp. 165–173. https://doi.org/10.1007/978-3-030-86258-9_17

A. Masmoudi, H. Bellaaj, K. Drira, and M. Jmaiel, “A co‐training‐based approach for the hierarchical multi‐label classification of research papers,” Expert Syst, vol. 38, no. 4, Jun. 2021. https://doi.org/10.1111/exsy.12613

A. Umparat and S. Phoomvuthisarn, “Improving Pre-Trained Models for Multi-Label Classification in Stack Overflow: A Comparison of Imbalanced Data Handling Methods,” in Proc. 20th Int. Joint Conf. Comput. Sci. Softw. Eng. (JCSSE), Jun. 2023, pp. 464–469. https://ieeexplore.ieee.org/document/10202012

E. d’Andréa, J. François, O. Festor, and M. Zakroum, “Multi-label Classification of Hosts Observed through a Darknet,” in NOMS 2023-2023 IEEE/IFIP Network Operations and Management Symposium, IEEE, May 2023, pp. 1–6. https://doi.org/10.1109/NOMS56928.2023.10154356

H. Tsaniya, C. Fatichah, and N. Suciati, “Comparison of sampling methods for handling imbalance data in deep learning-based predictions of chest X-ray abnormality tags,” in Proc. 7th Int. Conf. Med. Health Inform (ICMHI), New York, NY, USA: ACM, May 2023, pp. 6–10. https://doi.org/10.1145/3608298.3608300

S. K. Singh and Dr. R. K. Dwivedi, “Data Mining: Dirty Data and Data Cleaning,” SSRN Electronic Journal, 2020. https://doi.org/10.32614/RJ_2021_046

A. Upadhye, “A Comprehensive Survey of Text Data Cleaning Techniques: Challenges, Methods, and Best Practices,” Available online www.jsaer.com Journal of Scientific and Engineering Research 205 Journal of Scientific and Engineering Research, vol. 2020, no. 8, pp. 205–210

R. Lourdusamy and S. Abraham, “A Survey on Text Pre-processing Techniques and Tools,” Int. J. Comput. Sci. Eng., vol. 6, no. 3, pp. 148–157, 2018. https://doi.org/10.26438/ijcse/v6si3.148157

A. R. Lubis and M. K. M. Nasution, “Twitter Data Analysis and Text Normalization in Collecting Standard Word,” Journal of Applied Engineering and Technological Science (JAETS), vol. 4, no. 2, pp. 855–863, Jun. 2023. https://doi.org/10.37385/jaets.v4i2.1991

W. Wahyudin, “Aplikasi Topic Modeling Pada Pemberitaan Portal Berita Online Selama Masa Psbb Pertama [Topic Modeling Application on Online News During First PSBB Period],” Seminar Nasional Official Statistics, vol. 2020, no. 1, pp. 309–318, Jan. 2021. https://doi.org/10.34123/semnasoffstat.v2020i1.579 (In Indonesian)

J. Hughes, “krippendorffsalpha: An R Package for Measuring Agreement using Krippendorff’s Alpha Coefficient.” https://doi.org/10.1111/exsy.12613

J. Alshehri, M. Stanojevic, E. Dragut, and Z. Obradovic, “On Label Quality in Class Imbalance Setting -A Case Study,” in 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA), IEEE, Dec. 2022, pp. 1666–1671. https://ieeexplore.ieee.org/document/10012345

Y. Kustiyahningsih and Y. Permana, “Penggunaan Latent Dirichlet Allocation (LDA) dan Support-Vector Machine (SVM) untuk Menganalisis Sentimen Berdasarkan Aspek Dalam Ulasan Aplikasi EdLink [Using LDA and SVM to Analyze Aspect-Based Sentiment in EdLink App Reviews],” Teknika, vol. 13, no. 1, pp. 127–136, Mar. 2024. https://doi.org/10.34148/teknika.v13i1.746 (In Indonesian)

F. Charte, A. J. Rivera, M. J. del Jesus, and F. Herrera, “Addressing imbalance in multilabel classification: Measures and random resampling algorithms,” Neurocomputing, vol. 163, pp. 3–16, Sep. 2015. https://doi.org/10.1016/j.neucom.2014.08.091

F. Charte, A. J. Rivera, M. J. del Jesus, and F. Herrera, “Tackling Multilabel Imbalance through Label Decoupling and Data Resampling Hybridization,” Knowl.-Based Syst., vol. 89, pp. 385–397, Feb. 2018. https://doi.org/10.1016/j.knosys.2015.07.019

F. Charte, A. J. Rivera, M. J. del Jesus, and F. Herrera, “MLSMOTE: Approaching imbalanced multilabel learning through synthetic instance generation,” Knowl Based Syst, vol. 89, pp. 385–397, Nov. 2015. https://doi.org/10.1016/j.knosys.2015.07.019

F. Charte, A. Rivera, M. J. del Jesus, and F. Herrera, “Concurrence among Imbalanced Labels and Its Influence on Multilabel Resampling Algorithms,” in Proc. 10th Int. Conf. Hybrid Artif. Intell. Syst. (HAIS), Springer, 2014, pp. 110–121. https://doi.org/10.1007/978-3-319-07617-1_10

A. R. Abelard and Y. Sibaroni, “Multi-aspect sentiment analysis on netflix application using latent dirichlet allocation and support vector machine methods,” J. Infotel, vol. 13, no. 3, pp. 128–133, Aug. 2021. https://doi.org/10.20895/infotel.v13i3.670

A. Hafeez et al., “Addressing Imbalance Problem for Multi Label Classification of Scholarly Articles,” IEEE Access, vol. 11, pp. 74500–74516, 2023. https://doi.org/10.1109/ACCESS.2023.3293852

Article Metrics

Abstract view: 66 times
Download     : 21   times

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Refbacks

  • There are currently no refbacks.