Analysis of Deep Learning Approach Based on Convolution Neural Network (CNN) for Classification of Web Page Title and Description Text

Aris Wahyu Murdiyanto, Muhammad Habibi

Submitted : 2022-08-20, Published : 2022-12-31.

Abstract

The volume of digital documents available online is growing exponentially due to the increasing use of the internet. Categorization of information obtained online is needed to make it easier for recipients of information to determine and filter which information is needed. Classification of web pages can be based on titles and descriptions, which are text data that can be done by utilizing deep learning technology for text classification. This study aimed to conduct data training and analysis experiments to determine the accuracy of the proposed deep learning architecture in classifying web page titles and descriptions. In this research, we proposed a Convolution Neural Network (CNN) architecture that generates few parameters. The training and evaluation set was conducted on the web page dataset provided by DMOZ. As a result, the proposed CNN architecture with the number of N (Dropout + 1D Convolution + ReLU activation) equal to 1 achieves the best validation accuracy. It achieves 79.51% with only generates 825,061 parameters. The proposed CNN architecture achieved outperformed performance on the accuracy of the five other technologies in the state-of-the-art.

Keywords

Deep learning, Convolution Neural Networks, Web page classification, Text classification, DMOZ dataset

Full Text:

PDF

References

A. Priyanto and M. R. Ma’arif, “Implementasi Web Scrapping dan Text Mining untuk Akuisisi dan Kategorisasi Informasi dari Internet (Studi Kasus: Tutorial Hidroponik),” Indonesian Journal of Information Systems, vol. 1, no. 1, pp. 25–33, Aug. 2018, doi: 10.24002/ijis.v1i1.1664.

J. Kristiyono and A. Nurrosyidah, “ANALISIS PERILAKU PENCARIAN INFORMASI DI INTERNET MELALUI FITUR VISUAL SEARCH,” Scriptura, vol. 11, no. 2, pp. 96–104, Dec. 2021, doi: 10.9744/SCRIPTURA.11.2.96-104.

M. I. Akrianto, A. D. Hartanto, and A. Priadana, “The Best Parameters to Select Instagram Account for Endorsement using Web Scraping,” in 2019 4th International Conference on Information Technology, Information Systems and Electrical Engineering (ICITISEE), Nov. 2019, pp. 40–45. doi: 10.1109/ICITISEE48480.2019.9004038.

A. Priadana and A. W. Murdiyanto, “Instagram Hashtag Trend Monitoring Using Web Scraping,” Journal Pekommas, vol. 5, no. 1, p. 23, Apr. 2020, doi: 10.30818/jpkm.2020.2050103.

A. W. Murdiyanto and A. Priadana, “Analysis of web scraping techniques to get keywords suggestion and allintitle automatically from Google Search Engines,” Compiler, vol. 10, no. 2, pp. 71–78, Nov. 2021, doi: 10.28989/COMPILER.V10I2.1064.

A. Himawan, A. Priadana, and A. Murdiyanto, “Implementation of Web Scraping to Build a Web-Based Instagram Account Data Downloader Application,” IJID (International Journal on Informatics for Development), vol. 9, no. 2, pp. 59–65, Dec. 2020, doi: 10.14421/IJID.2020.09201.

A. I. Abdullah, A. Priadana, M. Muhajir, and S. N. Nur, “Data Mining for Determining The Best Cluster Of Student Instagram Account As New Student Admission Influencer,” Telematika : Jurnal Informatika dan Teknologi Informasi, vol. 18, no. 2, pp. 255–266, Oct. 2021, doi: 10.31315/TELEMATIKA.V18I2.5067.

L. Deri, M. Martinelli, D. Sartiano, and L. Sideri, “Large scale web-content classification,” in 2015 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management (IC3K), 2015.

J. H. Lee, W. C. Yeh, and M. C. Chuang, “Web page classification based on a simplified swarm optimization,” Appl Math Comput, vol. 270, pp. 13–24, Nov. 2015, doi: 10.1016/J.AMC.2015.07.120.

I. N. Purnama, “Perbandingan Klasifikasi Website Secara Otomatis Menggunakan Metode Multilayer Perceptron dan Naive Bayes,” Jurnal Sistem Komputer dan Informatika (JSON), vol. 2, no. 2, pp. 155–161, Jan. 2021, doi: 10.30865/JSON.V2I2.2703.

G. Matošević, J. Dobša, and D. Mladenić, “Using Machine Learning for Web Page Classification in Search Engine Optimization,” Future Internet 2021, Vol. 13, Page 9, vol. 13, no. 1, p. 9, Jan. 2021, doi: 10.3390/FI13010009.

E. Buber and B. Diri, “Web Page Classification Using RNN,” Procedia Comput Sci, vol. 154, pp. 62–72, Jan. 2019, doi: 10.1016/J.PROCS.2019.06.011.

S. H. Apandi, J. Sallim, R. Mohamed, and A. Madbouly, “Web Page Classification Using Convolutional Neural Network (CNN) towards Eliminating Internet Addiction,” Proceedings - 2021 International Conference on Software Engineering and Computer Systems and 4th International Conference on Computational Science and Information Management, ICSECS-ICOCSIM 2021, pp. 149–154, Aug. 2021, doi: 10.1109/ICSECS52883.2021.00034.

J. Gong et al., “Hierarchical Graph Transformer-Based Deep Learning Model for Large-Scale Multi-Label Text Classification,” IEEE Access, vol. 8, pp. 30885–30896, 2020, doi: 10.1109/ACCESS.2020.2972751.

J. Wang, Y. Li, J. Shan, J. Bao, C. Zong, and L. Zhao, “Large-Scale Text Classification Using Scope-Based Convolutional Neural Network: A Deep Learning Approach,” IEEE Access, vol. 7, pp. 171548–171558, 2019, doi: 10.1109/ACCESS.2019.2955924.

J. Cai, J. Li, W. Li, and J. Wang, “Deeplearning Model Used in Text Classification,” 2018 15th International Computer Conference on Wavelet Active Media Technology and Information Processing, ICCWAMTIP 2018, pp. 123–126, Jan. 2019, doi: 10.1109/ICCWAMTIP.2018.8632592.

R. Wang, Z. Li, J. Cao, T. Chen, and L. Wang, “Convolutional Recurrent Neural Networks for Text Classification,” Proceedings of the International Joint Conference on Neural Networks, vol. 2019-July, Jul. 2019, doi: 10.1109/IJCNN.2019.8852406.

A. Gupta and R. Bhatia, “Ensemble approach for web page classification,” Multimedia Tools and Applications 2021 80:16, vol. 80, no. 16, pp. 25219–25240, Apr. 2021, doi: 10.1007/S11042-021-10891-3.

M. Hashemi, “Web page classification: a survey of perspectives, gaps, and future directions,” Multimedia Tools and Applications 2020 79:17, vol. 79, no. 17, pp. 11921–11945, Jan. 2020, doi: 10.1007/S11042-019-08373-8.

S. Moriya and C. Shibata, “Transfer Learning Method for Very Deep CNN for Text Classification and Methods for its Evaluation,” Proceedings - International Computer Software and Applications Conference, vol. 2, pp. 153–158, Jun. 2018, doi: 10.1109/COMPSAC.2018.10220.

C. Li, G. Zhan, and Z. Li, “News Text Classification Based on Improved Bi-LSTM-CNN,” Proceedings - 9th International Conference on Information Technology in Medicine and Education, ITME 2018, pp. 890–893, Dec. 2018, doi: 10.1109/ITME.2018.00199.

A. Priadana and A. A. Rizal, “Sentiment Analysis on Government Performance in Tourism During The COVID-19 Pandemic Period With Lexicon Based,” CAUCHY: Jurnal Matematika Murni dan Aplikasi, vol. 7, no. 1, pp. 28–39, Nov. 2021, doi: 10.18860/CA.V7I1.12488.

S. Selva Birunda and R. Kanniga Devi, “A review on word embedding techniques for text classification,” Lecture Notes on Data Engineering and Communications Technologies, vol. 59, pp. 267–281, 2021, doi: 10.1007/978-981-15-9651-3_23/COVER.

M. Habib, M. Faris, A. Alomari, and H. Faris, “Altibbivec: A word embedding model for medical and health applications in the arabic language,” IEEE Access, vol. 9, pp. 133875–133888, 2021, doi: 10.1109/ACCESS.2021.3115617.

P. Song, C. Geng, and Z. Li, “Research on Text Classification Based on Convolutional Neural Network,” Proceedings - 2nd International Conference on Computer Network, Electronic and Automation, ICCNEA 2019, pp. 229–232, Sep. 2019, doi: 10.1109/ICCNEA.2019.00052.

K. H. Chan, S. K. Im, and W. Ke, “Variable-Depth Convolutional Neural Network for Text Classification,” Communications in Computer and Information Science, vol. 1333, pp. 685–692, 2020, doi: 10.1007/978-3-030-63823-8_78/COVER.

S. Yang and Y. Tang, “Text Classification Based on Convolutional Neural Network and Attention Model,” 2020 3rd International Conference on Artificial Intelligence and Big Data, ICAIBD 2020, pp. 67–73, May 2020, doi: 10.1109/ICAIBD49809.2020.9137447.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A Simple Way to Prevent Neural Networks from Overfitting,” Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929–1958, 2014.

Article Metrics

Abstract view: 283 times
Download     : 148   times

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

Refbacks

  • There are currently no refbacks.