MODIFICATION OF CHF AND BIC COEFFICIENTS FOR EVALUATION OF CLUSTERING WITH MIXED TYPE VARIABLES
Research Journal of Economics, Business and ICT
View Archive InfoField | Value | |
Title |
MODIFICATION OF CHF AND BIC COEFFICIENTS FOR EVALUATION OF CLUSTERING WITH MIXED TYPE VARIABLES
|
|
Creator |
Loster, Tomas; University of Economics, Prague
|
|
Subject |
—
Cluster Analysis, Evaluation Of Clustering, BIC Criterion, CHF Criterion C38, C40 |
|
Description |
Current literature draws attention particularly to the evaluation of clustering in a situation when individual objects are characterized only by quantitative variables. The problems associated with the analysis of data characterized by qualitative or mixed type variables have only been dealt with to a limited extent. This is based on an analogy of the techniques applied when evaluating log-linear models for example.In this paper I suggest new coefficients for the evaluation of resulting clusters based on the principle of the variability analysis. Furthermore, only coefficients for mixed type variables based on a combination of sample variance and one of the variability measures for nominal variables will be presented. Similar approaches can be applied in the case of qualitative variables while omitting the part characterizing the variability of quantitative variables.In this paper I evaluated selected indices for determining the number of clusters when objects are characterized by mixed type variables too. On the basis of real data files analyses (Database The UCI Machine Learning Repository website: http://archive.ics.uci.edu/ml/datasets.html) I compared three newly proposed indices with the known BIC criterion, which is is implemented in two-step cluster analysis in the IBM SPSS Statistics system. I knew the number of object groups and I was interested in agreement of the found optimal number of clusters with the real number of groups. I had analyzed 15 data files and it was found that new indices determined the correct number of clusters more successful than BIC criterion which is is implemented in two-step cluster analysis in the IBM SPSS Statistics system. Criterions based on Gini coefficient were more successful than criterion based on Entropy.The CHFG index determined the correct number of clusters in most cases (93.33 %). The second successful criterion was the CHFH index (73.33 %). The BIC criterion determines the correct number of clusters in 40.0 % of cases and my modification of BIC criterion (using Gini coefficient instead of entropy, which i
|
|
Publisher |
English Time Schools & Overseas Education
|
|
Contributor |
Internal Grant Agency of University of Economics in Prague MF/F4/6/2013
|
|
Date |
2013-12-15
|
|
Type |
info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion Peer-reviewed Article |
|
Format |
application/pdf
|
|
Identifier |
http://ojs.journals.cz/index.php/RJEBI/article/view/394
|
|
Source |
Research Journal of Economics, Business and ICT; Vol 8, No 2 (2013)
2047-7848 2045-3345 |
|
Language |
eng
|
|
Relation |
http://ojs.journals.cz/index.php/RJEBI/article/view/394/388
|
|
Rights |
Copyright (c) 2013 Tomas Loster
https://creativecommons.org/licenses/by/3.0/ |
|