EKSTRAKSI TRENDING ISSUE DENGAN PENDEKATAN DISTRIBUSI KATA PADA PEMBOBOTAN TERM UNTUK PERINGKASAN MULTI-DOKUMEN BERITA
DOI:
https://doi.org/10.12962/j24068535.v14i2.a570Abstract
Penggunaan trending issue dari media sosial Twitter sebagai kalimat penting efektif dalam proses peringkasan dokumen dikarenakan trending issue memiliki kedekatan kata kunci terhadap sebuah kejadian berita yang sedang berlangsung. Pembobotan term dengan TFIDF yang hanya berbasis pada dokumen itu tidak cukup untuk menentukan in-deks dari suatu dokumen. Penentuan indeks yang akurat juga bergantung pada nilai informatif suatu term terhadap kelas atau cluster. Term yang sering muncul di banyak kelas atau cluster seharusnya tidak menjadi term yang penting meskipun nilai TFIDF-nya tinggi. Penelitian ini bertujuan untuk melakukan peringkasan multi dokumen berita menggunakan ekstraksi trending issue dengan pendekatan term distribution on centroid based (TDCB) pada pembobotan fitur dan mengintegrasikannya dengan query expansion sebagai kata kunci dalam peringkasan dokumen. Metode TDCB dilakukan dengan mempertimbangkan adanya kemunculan sub topic dari cluster hasil pengelompokan tweets yang dapat dijadikan nilai informatif tambahan dalam penentuan pembobotan kalimat penting penyusunan ringkasan. Tahapan yang dilakukan untuk menghasilkan ringkasan multi dokumen berita antara lain ekstraksi trending issue, query expansion, auto labelling, seleksi berita, ekstraksi fitur berita, pembobotan kalimat penting dan penyusunan ringkasan. Hasil percobaan menunjukan metode peringkasan dokumen dengan menambahkan nilai informatif sub topic trending issue NeFTIS-TDCB menunjukan nilai rata-rata max-ROUGE-1 terbesar 0.8615 untuk n=30 dari seluruh varian topik berita.
Downloads
References
[2] F. El-Ghannam and T. El-Shishtawy, "Multi-Topic Multi-Document Summarizer," arXiv preprint arXiv:1401.0640, 2014.
[3] D. Kim, S. Kim, M. Jo and E. Hwang, "SNS-based issue detection and related news summarization scheme," in Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication, 2014.
[4] N. Hayatin, C. Fatichah and D. Purwitasari, "Pembobotan Kalimat Berdasarkan Fitur Berita Dan Trending Issue Untuk Peringkasan Multi Dokumen Berita," JUTI: Jurnal Ilmiah Teknologi Informasi, 13(1), pp. 38-44, 2015.
[5] D. Purwitasari, C. Fatichah, I. Arieshanti and N. Hayatin, "K-medoids algorithm on Indonesian Twitter feeds for clustering trending issue as important terms in news summarization," in Information & Communication Technology and Systems (ICTS), Surabaya, 2015.
[6] V. Lertnattee and T. Theeramunkong, "Effect of term distributions on centroid-based text categorization," Information Sciences, 158, pp. 89-115, 2004.
[7] M. A. Pasca and S. M. Harabagiu, "High performance question/answering," in Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, 2001.
[8] H. Y. Aristoteles, A. Ridha and A. Julio, "ext Feature Weighting for Summarization of Documents in Bahasa Indonesia Using Genetic Algoritm. International Journal of Science Issues," IJCSI, pp. 1694-0814, 2012.
[9] I. Lanin, "Kateglo," 2009. [Online]. Available: https://ivanlanin.wordpress.com/2009/06/11/kateglo/. [Accessed 2015].
[10] R. Ferreira, F. Freitas, L. De Souza Cabral, R. Dueire Lins, R. Lima, G. Franca, S. Simske and L. Favaro, "A Context Based Text Summarization System," in Document Analysis Systems (DAS), 2014 11th IAPR International Workshop, Tours, 2014.
[11] K. Sarkar, "Sentence Clustering-based Summarization of Multiple Text Documents," nternational Journal of Computing Science and Communication Technologies, 2(1), pp. 325-335, 2009.
[12] I. Suputra, A. Z. Arifin and A. Yuniarti, "Pendekatan Positional Text Graph Untuk Pemilihan Kalimat Representatif Cluster Pada Peringkasan MultiDokumen," Jurnal Ilmu Komputer, p. 62, 2013.
[13] C. Y. Lin, "Rouge: A package for automatic evaluation of summaries," in Text summarization branches out: Proceedings of the ACL-04 workshop (Vol. 8), 2004.
Downloads
Published
Issue
Section
How to Cite
License
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in JUTI unless they receive approval for doing so from the Editor-in-Chief.
JUTI open access articles are distributed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.