ENHANCEMENT OF DECISION TREE METHOD BASED ON HIERARCHICAL CLUSTERING AND DISPERSION RATIO
DOI:
https://doi.org/10.12962/j24068535.v18i2.a1005Abstract
The classification process using a decision tree is a classification method that has a feature selection process. Decision tree classifications using information gain have a disadvantage when the dataset has unique attributes for each imbalanced class record and distribution. The data used for decision tree classification has 2 types, numerical and nominal. The numerical data type is carried out a discretization process so that it gets data intervals. Weaknesses in the information gain method can be reduced by using a dispersion ratio method that does not depend on the class distribution, but on the frequency distribution. Numeric type data will be dis-criticized using the hierarchical clustering method to obtain a balanced data cluster. The data used in this study were taken from the UCI machine learning repository, which has two types of numeric and nominal data. There are two stages in this research namely, first the numeric type data will be discretized using hierarchical clustering with 3 methods, namely single link, complete link, and average link. Second, the results of discretization will be merged again then the formation of trees with splitting attributes using dispersion ratio and evaluated with cross-validation k-fold 7. The results obtained show that the discretization of data with hierarchical clustering can increase predictions by 14.6% compared with data without discretization. The attribute splitting process with the dispersion ratio of the data resulting from the discretization of hierarchical clustering can increase the prediction by 6.51%.
Downloads
References
V. Herrera Semenets, O. A. P. Garcıa, R. H. Leon, J. van den Berg, and C. Doerr, "A Data Reduction Strategy and its Application on Scan and Backscatter Detection Using Rule-based Classifier," Expert Syst. Appl., 2017, doi: 10.1016/j.eswa.2017.11.041
S. Roy, S. Mondal, A. Ekbal, M. Sankar, and D. Felix, "Dispersion Ratio based Decision Tree Model for Classification," Expert Syst. Appl., vol. 116, pp. 1–9, 2019, doi: 10.1016/j.eswa.2018.08.039
J. Wang, S. Zhou, Y. Yi, and J. Kong, "An Improved Feature Selection Based on Effective Range for Classification," Recent Advances in Information Technology, vol. 2014, 2014.
L. Rutkowski, L. Pietruczuk, P. Duda, and M. Jaworski, "Decision Trees for Mining Data Streams Based on the McDiarmid’s Bound," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 6, pp. 1272–1279, 2013.
S. Roy, "CRDT : Correlation Ratio Based Decision Tree Model for Healthcare Data Mining," in Proc. IEEE Int. Conf. Bioinforma. Bioeng., 2016, doi: 10.1109/BIBE.2016.21
A. Li, A. Kumar, Y. Ha, and H. Corporaal, "Microprocessors and Microsystems Correlation ratio based volume image registration on GPUs," Microprocess. Microsyst., vol. 39, no. 8, pp. 998–1011, 2015, doi: 10.1016/j.micpro.2015.04.002
D. M. Maslove, T. Podchiyska, and H. J. Lowe, "Discretization of continuous features in clinical datasets," Journal of the Americal Medical Informatics Association, pp. 544–553, 2013, doi: 10.1136/amiajnl-2012-000929
E. Xu, S. Liangshan, R. Yongchang, W. Hao, and Q. Feng, "A New Discretization Approach of Continuous Attributes," in Proc. Asia-Pacific Conference on Wearable Computing Systems, pp. 141–143, 2010, doi: 10.1109/APWCS.2010.40
R. Dash, R. L. Paramguru, dan R. Dash, "Comparative Analysis of Supervised and Unsupervised Discretization Techniques," Int. J. Adv. Sci. Technol., 2011
B. Al Kindhi, T. A. Sardjono, M. H. Purnomo, and G. J. Verkerke, "Hybrid K-Means, Fuzzy C-Means, and Hierarchical Clustering for DNA Hepatitis C Virus Trend Mutation Analysis," Expert Syst. Appl., 2018, doi: 10.1016/j.eswa.2018.12.019
S. Horng, F. Yang, and S. Lin, "Expert Systems with Applications Hierarchical fuzzy clustering decision tree for classifying recipes of ion implanter," Expert Syst. Appl., vol. 38, no. 1, pp. 933–940, 2011, doi: 10.1016/j.eswa.2010.07.076
M. K. Mouthami, "Sentiment Analysis and Classification Based On Textual Reviews," in Proc. Int. Conf. Inf. Commun. Embed. Syst., 2013.
R. Pandya, "C5 . 0 Algorithm to Improved Decision Tree with Feature Selection and Reduced Error Pruning," International Journal of Computer Applications, vol. 117, no. 16, pp. 18–21, 2015.
S. Cheng and M. Pecht, "Using cross-validation for model parameter selection of sequential probability ratio test," Expert Syst. Appl., vol. 39, no. 9, pp. 8467–8473, 2012, doi: 10.1016/j.eswa.2012.01.172
M. Jafarzadegan, F. Safi-esfahani, and Z. Beheshti, "Combining hierarchical clustering approaches using the PCA method," Expert Syst. Appl., vol. 137, pp. 1–10, 2019, doi: 10.1016/j.eswa.2019.06.064
F. Ros and S. Guillaume, "A hierarchical clustering algorithm and an improvement of the single linkage criterion to deal with noise," Expert Systems with Applications, vol. 128, pp. 96–108, 2019, doi: 10.1016/j.eswa.2019.03.03
Downloads
Published
Issue
Section
How to Cite
License
All papers should be submitted electronically. All submitted manuscripts must be original work that is not under submission at another journal or under consideration for publication in another form, such as a monograph or chapter of a book. Authors of submitted papers are obligated not to submit their paper for publication elsewhere until an editorial decision is rendered on their submission. Further, authors of accepted papers are prohibited from publishing the results in other publications that appear before the paper is published in JUTI unless they receive approval for doing so from the Editor-in-Chief.
JUTI open access articles are distributed under a Creative Commons Attribution-ShareAlike 4.0 International License. This license lets the audience to give appropriate credit, provide a link to the license, and indicate if changes were made and if they remix, transform, or build upon the material, they must distribute contributions under the same license as the original.