REDUKSI DIMENSI FITUR MENGGUNAKAN ALGORITMA ALOFT UNTUK PENGELOMPOKAN DOKUMEN
DOI:
https://doi.org/10.12962/j24068535.v14i2.a573Abstract
Pengelompokan dokumen masih memiliki tantangan dimana semakin besar dokumen maka akan menghasilkan fitur yang semakin banyak. Sehingga berdampak pada tingginya dimensi dan dapat menyebabkan performa yang buruk terhadap algoritma clustering. Cara untuk mengatasi masalah ini adalah dengan reduksi dimensi. Metode reduksi dimensi seperti seleksi fitur dengan metode filter telah digunakan untuk pengelompokan dokumen. Akan tetapi metode filter sangat tergantung pada masukan pengguna untuk memilih sejumlah n fitur teratas dari keseluruhan dokumen. Algoritma ALOFT (At Least One FeaTure) dapat menghasilkan sejumlah set fitur secara otomatis tanpa adanya parameter masukan dari pengguna. Karena sebelumnya algoritma ALOFT digunakan pada klasifikasi dokumen, metode filter yang digunakan pada algoritma ALOFT membutuhkan adanya label pada kelas sehingga metode filter tersebut tidak dapat digunakan untuk pengelompokan dokumen. Pada penelitian ini diusulkan metode reduksi dimensi fitur dengan menggunakan variasi metode filter pada algoritma ALOFT untuk pengelompokan dokumen. Sebelum dilakukan proses reduksi dimensi langkah pertama yang harus dilakukan adalah tahap preprocessing kemudian dilakukan perhitungan bobot tfidf. Proses reduksi dimensi dilakukan dengan menggunakan metode filter seperti Document Frequency (DF), Term Contribution (TC), Term Variance Quality (TVQ), Term Variance (TV), Mean Absolute Difference (MAD), Mean Median (MM), dan Arithmetic Mean Geometric Mean (AMGM). Selanjutnya himpunan fitur akhir dipilih dengan algoritma ALOFT. Tahap terakhir adalah pengelompokan dokumen menggunakan dua metode clustering yang berbeda yaitu k-means dan Hierarchical Agglomerative Clustering (HAC). Dari hasil ujicoba didapatkan bahwa kualitas cluster yang dihasilkan oleh metode usulan dengan menggunakan algoritma k-means mampu memperbaiki hasil dari metode VR.
Downloads
References
[2] D. G. Roussinov and H. Chen, "Document clustering for electronic meetings: an experimental comparison of two techniques," Decision Support Systems, vol. 27, no. 1, pp. 67-79, 1999.
[3] F. Rozi, S. H. Wijoyo, S. A. Isanta, Y. Azhar and P. Diana, "Pelabelan Klaster Fitur Secara Otomatis pada Perbandingan Review Produk," Jurnal Teknologi Informasi dan Ilmu Komputer, vol. 1, no. 2, pp. 55-61, 2014.
[4] N. Hayatin, C. Fatichah and D. Purwitasari, "Pembobotan Kalimat Berdasarkan Fitur Berita dan Trending Issue untuk Peringkasan Multi Dokumen Berita," JUTI: Jurnal Ilmiah Teknologi Informasi, vol. 13, no. 1, pp. 38-44, 2015.
[5] I. Lukmana, D. Swanjaya, A. Kurniawardhani, A. Z. Arifin and D. Purwitasari, "Multi-Document Summarization Based On Sentence Clustering Improved Using Topic Words," JUTI: Jurnal Ilmiah Teknologi Informasi, vol. 12, no. 2, pp. 1-8, 2014.
[6] A. Z. Arifin, I. P. A. K. Mahendra and H. T. Ciptaningtyas, "Enhanced confix stripping stemmer and ants algorithm for classifying news document in indonesian language," 2009.
[7] G. Salton, A. Wong and C.-S. Yang, "A vector space model for automatic indexing," Communications of the ACM, vol. 18, p. 613–620, 1975.
[8] S. Tabakhi, P. Moradi and F. Akhlaghian, "An unsupervised feature selection algorithm based on ant colony optimization," Engineering Applications of Artificial Intelligence, vol. 32, pp. 112-123, 2014.
[9] W. Song and S. C. Park, "Genetic algorithm for text clustering based on latent semantic indexing," Computers & Mathematics with Applications, vol. 57, no. 11, pp. 1901-1907, 2009.
[10] K. K. Bharti and P. K. Singh, "A three-stage unsupervised dimension reduction method for text clustering," Journal of Computational Science, vol. 5, no. 2, pp. 156-169, 2014.
[11] K. K. Bharti and P. K. Singh, "Hybrid dimension reduction by integrating feature selection with feature extraction method for text clustering," Expert Systems with Applications, vol. 42, p. 3105–3114, 2015.
[12] L. Liu, J. Kang, J. Yu and Z. Wang, "A comparative study on unsupervised feature selection methods for text clustering," 2005.
[13] A. J. Ferreira and M. A. Figueiredo, "Efficient feature selection filters for high-dimensional data," Pattern Recognition Letters, vol. 33, no. 13, pp. 1794-1804, 2012.
[14] H. Liu, H. Motoda and eds, Computational methods of feature selection, CRC Press, 2007.
[15] R. H. Pinheiro, G. D. Cavalcanti, R. F. Correa and T. I. Ren, "A global-ranking local feature selection method for text categorization," Expert Systems with Applications, vol. 39, no. 17, pp. 12851-12857, 2012.
[16] C. C. Aggarwal and C. Zhai, Mining text data, Springer Science & Business Media, 2012.
[17] F. Z. Tala, "A study of stemming effects on information retrieval in Bahasa Indonesia," Institute for Logic, Language and Computation Universeit Van Amsterdam, 2003.
[18] I. Dhillon, J. Kogan and C. Nicholas, "Feature selection and document clustering," New York, 2004.
[19] P. J. Rousseeuw, "Silhouettes: a graphical aid to the interpretation and validation of cluster analysis," Journal of computational and applied mathematics, vol. 20, pp. 53-65, 1987.
Downloads
Published
Issue
Section
How to Cite
License
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in JUTI unless they receive approval for doing so from the Editor-in-Chief.
JUTI open access articles are distributed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.