MULTI-DOCUMENT SUMMARIZATION USING A COMBINATION OF FEATURES BASED ON CENTROID AND KEYWORD
Abstract
Summarizing text in multi-documents requires choosing important sentences which are more complex than in one document because there is different information which results in contradictions and redundancy of information. The process of selecting important sentences can be done by scoring sentences that consider the main information. The combination of features is carried out for the process of scoring sentences so that sentences with high scores become candidates for summary. The centroid approach provides an advantage in obtaining key information. However, the centroid approach is still limited to information close to the center point. The addition of positional features provides increased information on the importance of a sentence, but positional features only focus on the main position. Therefore, researchers use the keyword feature as a research contribution that can provide additional information on important words in the form of N-grams in a document. In this study, the centroid, position, and keyword features were combined for a scoring process which can provide increased performance for multi-document news data and reviews. The test results show that the addition of keyword features produces the highest value for news data DUC2004 ROUGE-1 of 35.44, ROUGE-2 of 7.64, ROUGE-L of 37.02, and BERTScore of 84.22. While the Amazon review data was obtained with ROUGE-1 of 32.24, ROUGE-2 of 6.14, ROUGE-L of 34.77, and BERTScore of 85.75. The ROUGE and BERScore values outperform the other unsupervised models.
Full Text:
PDFReferences
W. S. El-Kassas, C. R. Salama, A. A. Rafea, and H. K. Mohamed, “Automatic text summarization: A comprehensive survey,” Expert Syst. Appl., vol. 165, p. 113679, 2021, doi: 10.1016/j.eswa.2020.113679.
C. Ma, W. E. Zhang, M. Guo, H. Wang, and Q. Z. Sheng, “Multi-document Summarization via Deep Learning Techniques: A Survey,” ACM Comput. Surv., Apr. 2022, doi: 10.1145/3529754.
Z. Liang, J. Du, and C. Li, “Abstractive social media text summarization using selective reinforced Seq2Seq attention model,” Neurocomputing, vol. 410, pp. 432–440, Oct. 2020, doi: 10.1016/j.neucom.2020.04.137.
D. Gunawan, S. H. Harahap, and R. Fadillah Rahmat, “Multi-document Summarization by using TextRank and Maximal Marginal Relevance for Text in Bahasa Indonesia,” Proceeding - 2019 Int. Conf. ICT Smart Soc. Innov. Transform. Towar. Smart Reg. ICISS 2019, pp. 1–5, 2019, doi: 10.1109/ICISS48059.2019.8969785.
A. Khan et al., “Sentence Embedding Based Semantic Clustering Approach for Discussion Thread Summarization,” Complexity, vol. 2020, pp. 1–11, Aug. 2020, doi: 10.1155/2020/4750871.
H. C. Manh, H. Le Thanh, and T. L. Minh, “Extractive Multi-document Summarization using K-means, Centroid-based Method, MMR, and Sentence Position,” in Proceedings of the Tenth International Symposium on Information and Communication Technology - SoICT 2019, 2019, pp. 29–35, doi: 10.1145/3368926.3369688.
J. Lovinger, I. Valova, and C. Clough, “Gist: general integrated summarization of text and reviews,” Soft Comput., vol. 23, no. 5, pp. 1589–1601, Mar. 2019, doi: 10.1007/s00500-017-2882-2.
S. Lamsiyah, A. El Mahdaouy, B. Espinasse, and S. El Alaoui Ouatik, “An unsupervised method for extractive multi-document summarization based on centroid approach and sentence embeddings,” Expert Syst. Appl., vol. 167, no. September 2020, p. 114152, 2021, doi: 10.1016/j.eswa.2020.114152.
A. Joshi, E. Fidalgo, E. Alegre, and R. Alaiz-Rodriguez, “RankSum—An unsupervised extractive text summarization based on rank fusion,” Expert Syst. Appl., vol. 200, p. 116846, Aug. 2022, doi: 10.1016/j.eswa.2022.116846.
A. Joshi, E. Fidalgo, E. Alegre, and L. Fernández-Robles, “SummCoder: An unsupervised framework for extractive text summarization based on deep auto-encoders,” Expert Syst. Appl., vol. 129, pp. 200–215, Sep. 2019, doi: 10.1016/j.eswa.2019.03.045.
G. A. M. Mendoza, Y. Ledeneva, and R. A. García-Hernández, “Determining the importance of sentence position for automatic text summarization,” J. Intell. Fuzzy Syst., vol. 39, no. 2, pp. 2421–2431, Aug. 2020, doi: 10.3233/JIFS-179902.
H. Li, J. Zhu, J. Zhang, C. Zong, and X. He, “Keywords-Guided Abstractive Sentence Summarization,” Proc. AAAI Conf. Artif. Intell., vol. 34, no. 05, pp. 8196–8203, Apr. 2020, doi: 10.1609/aaai.v34i05.6333.
D. Patel, S. Shah, and H. Chhinkaniwala, “Fuzzy logic based multi document summarization with improved sentence scoring and redundancy removal technique,” Expert Syst. Appl., vol. 134, pp. 167–177, Nov. 2019, doi: 10.1016/j.eswa.2019.05.045.
A. P. Widyassari et al., “Review of automatic text summarization techniques & methods,” J. King Saud Univ. - Comput. Inf. Sci., vol. 34, no. 4, pp. 1029–1046, Apr. 2022, doi: 10.1016/j.jksuci.2020.05.006.
R. Campos, V. Mangaravite, A. Pasquali, A. Jorge, C. Nunes, and A. Jatowt, “YAKE! Keyword extraction from single documents using multiple local features,” Inf. Sci. (Ny)., vol. 509, pp. 257–289, Jan. 2020, doi: 10.1016/j.ins.2019.09.013.
N. Franciscus, J. Wang, and B. Stantic, “Mining Summary of Short Text with Centroid Similarity Distance,” 2019, pp. 447–461.
J. Devlin, M. W. Chang, K. Lee, and K. Toutanova, “BERT: Pre-training of deep bidirectional transformers for language understanding,” NAACL HLT 2019 - 2019 Conf. North Am. Chapter Assoc. Comput. Linguist. Hum. Lang. Technol. - Proc. Conf., vol. 1, no. Mlm, pp. 4171–4186, 2019.
W. Guan, I. Smetannikov, and M. Tianxing, “Survey on Automatic Text Summarization and Transformer Models Applicability,” in 2020 International Conference on Control, Robotics and Intelligent System, Oct. 2020, pp. 176–184, doi: 10.1145/3437802.3437832.
Z. Cai, N. Lin, C. Ma, and S. Jiang, “Indonesian Automatic Text Summarization Based on A New Clustering Method in Sentence Level,” in Proceedings of the 2019 International Conference on Big Data Engineering, Jun. 2019, pp. 30–35, doi: 10.1145/3341620.3341626.
D. Seitkali and R. Musabayev, “Using Centroid Keywords and Word Mover’s Distance for Single Document Extractive Summarization,” in Proceedings of the 2019 3rd International Conference on Natural Language Processing and Information Retrieval, Jun. 2019, pp. 149–152, doi: 10.1145/3342827.3342852.
F. B. Goularte, S. M. Nassar, R. Fileto, and H. Saggion, “A text summarization method based on fuzzy rules and applicable to automated assessment,” Expert Syst. Appl., vol. 115, pp. 264–275, Jan. 2019, doi: 10.1016/j.eswa.2018.07.047.
S. Gong, Z. Zhu, J. Qi, C. Tong, Q. Lu, and W. Wu, “Improving extractive document summarization with sentence centrality,” PLoS One, vol. 17, no. 7, p. e0268278, Jul. 2022, doi: 10.1371/journal.pone.0268278.
T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating Text Generation with BERT,” Apr. 2019, [Online]. Available: http://arxiv.org/abs/1904.09675.
A. Bražinskas, M. Lapata, and I. Titov, “Unsupervised Opinion Summarization as Copycat-Review Generation,” in Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 5151–5169, doi: 10.18653/v1/2020.acl-main.461.
W. Widodo, M. Nugraheni, and I. P. Sari, “A comparative review of extractive text summarization in Indonesian language,” IOP Conf. Ser. Mater. Sci. Eng., vol. 1098, no. 3, p. 032041, Mar. 2021, doi: 10.1088/1757-899X/1098/3/032041.
C. Liu, Y. Che, and R. Duan, “Research and Improvement of TextRank Algorithm Adding Degree Adverbs,” J. Phys. Conf. Ser., vol. 2005, no. 1, p. 012058, Aug. 2021, doi: 10.1088/1742-6596/2005/1/012058.
S. Lamsiyah, A. El Mahdaouy, S. E. A. Ouatik, and B. Espinasse, “Unsupervised extractive multi-document summarization method based on transfer learning from BERT multi-task fine-tuning,” J. Inf. Sci., vol. 49, no. 1, pp. 164–182, Feb. 2023, doi: 10.1177/0165551521990616.
DOI: http://dx.doi.org/10.12962/j24068535.v21i2.a1195
Refbacks
- There are currently no refbacks.