TERM WEIGHTING BASED ON INDEX OF GENRE FOR WEB PAGE GENRE CLASSIFICATION

Sugiyanto Sugiyanto, Nanang Fakhrur Rozi, Tesa Eranti Putri, Agus Zainal Arifin

Abstract


Automating the identification of the genre of web pages becomes an important area in web pages classification, as it can be used to improve the quality of the web search result and to reduce search time. To index the terms used in classification, generally the selected type of weighting is the document-based TF-IDF. However, this method does not consider genre, whereas web page documents have a type of categorization called genre. With the existence of genre, the term appearing often in a genre should be more significant in document indexing compared to the term appearing frequently in many genres despites its high TF-IDF value. We proposed a new weighting method for web page documents indexing called inverse genre frequency (IGF). This method is based on genre, a manual categorization done semantically from previous research. Experimental results show that the term weighting based on index of genre (TF-IGF) performed better compared to term weighting based on index of document (TF-IDF), with the highest value of accuracy, precision, recall, and F-measure in case of excluding the genre-specific keywords were 78%, 80.2%, 78%, and 77.4% respectively, and in case of including the genre-specific keywords were 78.9%, 78.7%, 78.9%, and 78.1% respectively.


Full Text:

PDF


DOI: http://dx.doi.org/10.12962/j24068535.v12i1.a43

Refbacks

  • There are currently no refbacks.