Connectivity-Based Clustering for Mixed Discrete and Continuous Data


Mahfuza Khatun1 and Sikandar Siddiqui2, 1Jahangirnagar University, Bangladesh , 2Deloitte Audit Analytics GmbH, Germany


This paper introduces a density-based clustering procedure for datasets with variables of mixed type. The proposed procedure, which is closely related to the concept of shared neighbourhoods, works particularly well in cases where the individual clusters differ greatly in terms of the average pairwise distance of the associated objects. Using a number of concrete examples, it is shown that the proposed clustering algorithm succeeds in allowing the identification of subgroups of objects with statistically significant distributional characteristics.


Cluster analysis, mixed data, distance measures