ENHANCEMENT OF DECISION TREE METHOD BASED ON HIERARCHICAL CLUSTERING AND DISPERSION RATIO

Dimas Ari Setyawan, Chastine Fatichah

Abstract


The classification process using a decision tree is a classification method that has a feature selection process. Decision tree classifications using information gain have a disadvantage when the dataset has unique attributes for each imbalanced class record and distribution. The data used for decision tree classification has 2 types, numerical and nominal. The numerical data type is carried out a discretization process so that it gets data intervals. Weaknesses in the information gain method can be reduced by using a dispersion ratio method that does not depend on the class distribution, but on the frequency distribution. Numeric type data will be dis-criticized using the hierarchical clustering method to obtain a balanced data cluster. The data used in this study were taken from the UCI machine learning repository, which has two types of numeric and nominal data. There are two stages in this research namely, first the numeric type data will be discretized using hierarchical clustering with 3 methods, namely single link, complete link, and average link. Second, the results of discretization will be merged again then the formation of trees with splitting attributes using dispersion ratio and evaluated with cross-validation k-fold 7. The results obtained show that the discretization of data with hierarchical clustering can increase predictions by 14.6% compared with data without discretization. The attribute splitting process with the dispersion ratio of the data resulting from the discretization of hierarchical clustering can increase the prediction by 6.51%.


Full Text:

PDF

References


V. Herrera Semenets, O. A. P. Garcıa, R. H. Leon, J. van den Berg, and C. Doerr, "A Data Reduction Strategy and its Application on Scan and Backscatter Detection Using Rule-based Classifier," Expert Syst. Appl., 2017, doi: 10.1016/j.eswa.2017.11.041

S. Roy, S. Mondal, A. Ekbal, M. Sankar, and D. Felix, "Dispersion Ratio based Decision Tree Model for Classification," Expert Syst. Appl., vol. 116, pp. 1–9, 2019, doi: 10.1016/j.eswa.2018.08.039

J. Wang, S. Zhou, Y. Yi, and J. Kong, "An Improved Feature Selection Based on Effective Range for Classification," Recent Advances in Information Technology, vol. 2014, 2014.

L. Rutkowski, L. Pietruczuk, P. Duda, and M. Jaworski, "Decision Trees for Mining Data Streams Based on the McDiarmid’s Bound," IEEE Transactions on Knowledge and Data Engineering, vol. 25, no. 6, pp. 1272–1279, 2013.

S. Roy, "CRDT : Correlation Ratio Based Decision Tree Model for Healthcare Data Mining," in Proc. IEEE Int. Conf. Bioinforma. Bioeng., 2016, doi: 10.1109/BIBE.2016.21

A. Li, A. Kumar, Y. Ha, and H. Corporaal, "Microprocessors and Microsystems Correlation ratio based volume image registration on GPUs," Microprocess. Microsyst., vol. 39, no. 8, pp. 998–1011, 2015, doi: 10.1016/j.micpro.2015.04.002

D. M. Maslove, T. Podchiyska, and H. J. Lowe, "Discretization of continuous features in clinical datasets," Journal of the Americal Medical Informatics Association, pp. 544–553, 2013, doi: 10.1136/amiajnl-2012-000929

E. Xu, S. Liangshan, R. Yongchang, W. Hao, and Q. Feng, "A New Discretization Approach of Continuous Attributes," in Proc. Asia-Pacific Conference on Wearable Computing Systems, pp. 141–143, 2010, doi: 10.1109/APWCS.2010.40

R. Dash, R. L. Paramguru, dan R. Dash, "Comparative Analysis of Supervised and Unsupervised Discretization Techniques," Int. J. Adv. Sci. Technol., 2011

B. Al Kindhi, T. A. Sardjono, M. H. Purnomo, and G. J. Verkerke, "Hybrid K-Means, Fuzzy C-Means, and Hierarchical Clustering for DNA Hepatitis C Virus Trend Mutation Analysis," Expert Syst. Appl., 2018, doi: 10.1016/j.eswa.2018.12.019

S. Horng, F. Yang, and S. Lin, "Expert Systems with Applications Hierarchical fuzzy clustering decision tree for classifying recipes of ion implanter," Expert Syst. Appl., vol. 38, no. 1, pp. 933–940, 2011, doi: 10.1016/j.eswa.2010.07.076

M. K. Mouthami, "Sentiment Analysis and Classification Based On Textual Reviews," in Proc. Int. Conf. Inf. Commun. Embed. Syst., 2013.

R. Pandya, "C5 . 0 Algorithm to Improved Decision Tree with Feature Selection and Reduced Error Pruning," International Journal of Computer Applications, vol. 117, no. 16, pp. 18–21, 2015.

S. Cheng and M. Pecht, "Using cross-validation for model parameter selection of sequential probability ratio test," Expert Syst. Appl., vol. 39, no. 9, pp. 8467–8473, 2012, doi: 10.1016/j.eswa.2012.01.172

M. Jafarzadegan, F. Safi-esfahani, and Z. Beheshti, "Combining hierarchical clustering approaches using the PCA method," Expert Syst. Appl., vol. 137, pp. 1–10, 2019, doi: 10.1016/j.eswa.2019.06.064

F. Ros and S. Guillaume, "A hierarchical clustering algorithm and an improvement of the single linkage criterion to deal with noise," Expert Systems with Applications, vol. 128, pp. 96–108, 2019, doi: 10.1016/j.eswa.2019.03.03




DOI: http://dx.doi.org/10.12962/j24068535.v18i2.a1005

Refbacks

  • There are currently no refbacks.