KEYWORD IDENTIFICATION IN SCIENTIFIC JOURNAL PUBLICATION CONTENT FOR CASE STUDY ITS ONLINE PUBLICATION (POMITS) SEARCHING

Abdul Munif, Nurul Fajrin Ariyani, Khairunnisa’ Rahma Mardiyani

Abstract


ITS Online Publication (POMITS) is a publication journal for ITS undergraduate students. Many articles are published in it, and they are often needed as reference material for other student research. The search process is still based on title, abstract, author's name, and keywords. The data is still entered manually by the author. This process allows the selection of less appropriate keywords. So an effort is needed so that the choice of these keywords can be more precise and represent the article.

The purpose of this research is to identify keywords in articles automatically. These keywords are distinguished into the software used, methods, and other representative keywords. With this identification, article searches can return more precise search results. This problem can be solved by using Named Entity Recognition (NER). However, the Indonesian language NER model owned by SpaCy is still not available, so it is necessary to develop the NER model.

This study identifies each keyword annotation in POMITS content into metadata by detecting named entities in the form of software, methods, and representative keywords using the NER model. The NER annotation results are stored as triplet pairs in the Apache Jena Fuseki triple store. Furthermore, the triple store can answer searches about software, methods, and keywords. Based on the test results, the system successfully detects NER entities and saves annotations as triplet pairs on Apache Jena Fuseki. Keywords identification produce an average value of 84.76% precision and 63.59% recall.

 


Full Text:

PDF

References


S. Mesbah, K. Fragkeskos, C. Lofi, A. Bozzon, dan G. J. Houben, “Semantic annotation of data processing pipelines in scientific publications,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10249 LNCS, hlm. 321–336, 2017, doi: 10.1007/978-3-319-58068-5_20/TABLES/6.

N. F. Ariyani, A. Munif, dan P. Q. Ayunin, “An automatic annotation method on MOOC’s learning content,” dalam Proceedings of 2019 International Conference on Information and Communication Technology and Systems, ICTS 2019, 2019. doi: 10.1109/ICTS.2019.8850965.

P. Lopez, “GROBID: Combining automatic bibliographic data recognition and term extraction for scholarship publications,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5714 LNCS, hlm. 473–474, 2009, doi: 10.1007/978-3-642-04346-8_62.

B. Ghavimi, P. Mayr, S. Vahdati, dan C. Lange, “Identifying and improving dataset references in social sciences full texts,” Positioning and Power in Academic Publishing: Players, Agents and Agendas - Proceedings of the 20th International Conference on Electronic Publishing, ELPUB 2016, hlm. 105–114, 2016, doi: 10.3233/978-1-61499-649-1-105.

F. Osborne, H. de Ribaupierre, dan E. Motta, “TechMiner: Extracting technologies from academic publications,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 10024 LNAI, hlm. 463–479, 2016, doi: 10.1007/978-3-319-49004-5_30.

“Publikasi Online ITS (POMITS).” https://ejurnal.its.ac.id/ (diakses 15 April 2022).

“Apache Jena - Apache Jena Fuseki.” https://jena.apache.org/documentation/fuseki2/ (diakses 12 April 2022).

N. H. Pribadi, “Sistem Rekomendasi Karya Ilmiah Berdasarkan Semantic Similarity Menggunakan FastText dan Word Mover’s Distance,” 2020.

“Natural Language Processing.” https://socs.binus.ac.id/2013/06/22/natural-language-processing/ (diakses 12 April 2022).

“spaCy 101: Everything you need to know · spaCy Usage Documentation.” https://spacy.io/usage/spacy-101 (diakses 12 April 2022).

“Prodigy 101 – everything you need to know · Prodigy · An annotation tool for AI, Machine Learning & NLP.” https://prodi.gy/docs (diakses 12 April 2022).

“Flask · PyPI.” https://pypi.org/project/Flask/ (diakses 12 April 2022).

“SPARQL Query Language for RDF.” https://www.w3.org/TR/rdf-sparql-query/ (diakses 12 April 2022).




DOI: http://dx.doi.org/10.12962/j24068535.v21i1.a1187

Refbacks

  • There are currently no refbacks.