PENGGUNAAN TEKNIK FEATURE WEIGHTING UNTUK PEMBERSIHAN NOISE PADA HALAMAN SITUS BERITA BERBAHASA INDONESIA
Abstract
A web page usually consists of information in every page blocks displayed. In some cases, news content displayed in a news website are not entirely relevant or are unrelated to the main content such as navigation panel, copyright, user guide, links, news summary, various advertisement etc. Information blocks irrelevant to the main content is known as web pages noise. This research applies feature weighting technique to improve classification results by detecting a noise in pages of a website. Using feature weighting technique the web is first modelled with Document Object Model(DOM) tree and Compressed Structure Tree(CST) to obtain the general structure and compare the information blocks in awebsite.Information obtained is used to measure and evaluate the importance level of each node created by Compressed Structureed Tree(CST). Based on the tree created and the importance level of each node, this method assign weights on each individual word (feature) in each content block. The weights will be used in web mining process.Downloads
Downloads
Published
Issue
Section
License
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in JUTI unless they receive approval for doing so from the Editor-in-Chief.
JUTI open access articles are distributed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.











