Record Details

MODIFICATION OF CHF AND BIC COEFFICIENTS FOR EVALUATION OF CLUSTERING WITH MIXED TYPE VARIABLES

Research Journal of Economics, Business and ICT

View Archive Info
 
 
Field Value
 
Title MODIFICATION OF CHF AND BIC COEFFICIENTS FOR EVALUATION OF CLUSTERING WITH MIXED TYPE VARIABLES
 
Creator Loster, Tomas; University of Economics, Prague
 
Subject
Cluster Analysis, Evaluation Of Clustering, BIC Criterion, CHF Criterion
C38, C40
 
Description Current literature draws attention particularly to the evaluation of clustering in a situation when individual objects are characterized only by quantitative variables. The problems associated with the analysis of data characterized by qualitative or mixed type variables have only been dealt with to a limited extent. This is based on an analogy of the techniques applied when evaluating log-linear models for example.In this paper I suggest new coefficients for the evaluation of resulting clusters based on the principle of the variability analysis. Furthermore, only coefficients for mixed type variables based on a combination of sample variance and one of the variability measures for nominal variables will be presented. Similar approaches can be applied in the case of qualitative variables while omitting the part characterizing the variability of quantitative variables.In this paper I evaluated selected indices for determining the number of clusters when objects are characterized by mixed type variables too. On the basis of real data files analyses (Database The UCI Machine Learning Repository website: http://archive.ics.uci.edu/ml/datasets.html) I compared three newly proposed indices with the known BIC criterion, which is is implemented in two-step cluster analysis in the IBM SPSS Statistics system. I knew the number of object groups and I was interested in agreement of the found optimal number of clusters with the real number of groups. I had analyzed 15 data files and it was found that new indices determined the correct number of clusters more successful than BIC criterion which is is implemented in two-step cluster analysis in the IBM SPSS Statistics system. Criterions based on Gini coefficient were more successful than criterion based on Entropy.The CHFG index determined the correct number of clusters in most cases (93.33 %). The second successful criterion was the CHFH index (73.33 %). The BIC criterion determines the correct number of clusters in 40.0 % of cases and my modification of BIC criterion (using Gini coefficient instead of entropy, which i
 
Publisher English Time Schools & Overseas Education
 
Contributor Internal Grant Agency of University of Economics in Prague MF/F4/6/2013
 
Date 2013-12-15
 
Type info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion
Peer-reviewed Article
 
Format application/pdf
 
Identifier http://ojs.journals.cz/index.php/RJEBI/article/view/394
 
Source Research Journal of Economics, Business and ICT; Vol 8, No 2 (2013)
2047-7848
2045-3345
 
Language eng
 
Relation http://ojs.journals.cz/index.php/RJEBI/article/view/394/388
 
Rights Copyright (c) 2013 Tomas Loster
https://creativecommons.org/licenses/by/3.0/