Incorporating Stability and Error-based Constraints for a Novel Partitional Clustering Algorithm

Title: Incorporating Stability and Error-based Constraints for a Novel Partitional Clustering Algorithm

Authors
Authors and Affiliations

K. Aparna, Mydhili K. Nair

Corresponding email: aparnak.bmsit@gmail.com

Published at : 29 Apr 2016
Volume : IJtech Vol 7, No 4 (2016)
DOI : https://doi.org/10.14716/ijtech.v7i4.1579

Cite this article as:

Aparna, K., & Nair, M.K. 2016. Incorporating Stability and Error-based Constraints for a Novel Partitional Clustering Algorithm. International Journal of Technology. Volume 7(4), pp.691-700

1,233

Downloads

K. Aparna	Department of Computer Applications, BMS Institute of Technology & Management, Yelahanka, Bengaluru – 560064, Karnataka State, India
Mydhili K. Nair	Department of Information Science & Engineering, M S Ramaiah Institute of Technology, MSR Nagar, Mathikere, Bengaluru – 560054, Karnataka State, India

Email to Corresponding Author

Abstract

Incorporating Stability and Error-based Constraints for a Novel Partitional Clustering Algorithm

Data clustering is one of the major areas in data mining. The bisecting clustering algorithm is one of the most widely used for high dimensional dataset. But its performance degrades as the dimensionality increases. Also, the task of selection of a cluster for further bisection is a challenging one. To overcome these drawbacks, we developed a novel partitional clustering algorithm called a HB-K-Means algorithm (High dimensional Bisecting K-Means). In order to improve the performance of this algorithm, we incorporate two constraints, such as a stability-based measure and a Mean Square Error (MSE) resulting in CHB-K-Means (Constraint-based High dimensional Bisecting K-Means) algorithm. The CHB-K-Means algorithm generates two initial partitions. Subsequently, it calculates the stability and MSE for each partition generated. Inference techniques are applied on the stability and MSE values of the two partitions to select the next partition for the re-clustering process. This process is repeated until K number of clusters is obtained. From the experimental analysis, we infer that an average clustering accuracy of 75% has been achieved. The comparative analysis of the proposed approach with the other traditional algorithms shows an achievement of a higher clustering accuracy rate and an increase in computation time.

Keywords

Bisecting K-Means, Constraints, High dimensionality, Mean Square Error (MSE), Partitional clustering, Stability

References

Aparna, K., Nair, M.K., 2015a. Comprehensive Study and Analysis of Partitional Data Clustering Techniques. International Journal of Business Analytics, Volume 2(1), pp. 23–38

Aparna, K., Nair, M.K., 2015b. HB-K Means: An Algorithm for High Dimensional Data Clustering using Bisecting K-Means. International Journal of Applied Engineering Research (IJAER), Volume 10(14), pp. 34945–34951

Behera, H.S., Lingdoh, R.B., Kodamasingh, D., 2011. An Improved Hybridized K-Means Clustering Algorithm (IHKMCA) for High dimensional Dataset & Its Performance Analysis. International Journal on Computer Science and Engineering (IJCSE), Volume 3(3), pp. 1183–1190

Bouguessa, M., Wang, S., 2008. Mining Projected Clusters in High-Dimensional Spaces. IEEE Transactions on Knowledge and Data Engineering, Volume 21(4), pp. 507–522

Dash, R., Mishra, D., Rath, A.K., Acharya, M., 2009. A Hybridized K-means Clustering Approach for High Dimensional Dataset. International Journal of Engineering, Science Technology, Volume 2(2), pp. 59–66

Ding, C., He, X., 2002. Cluster Merging and Splitting in Hierarchical Clustering Algorithms. In: Proceedings of the IEEE International Conference on Data Mining, pp. 139–146

Domeniconi, C., Ma, S., 2004. Subspace Clustering of High Dimensional Data. In: Proceedings of International Conference on Data Mining, pp. 517–521

Gu, J.W.F., Feng, W., Zeng, J., Mamitsuka, H., 2013. Efficient Semi-supervised MEDLINE Document Clustering with MeSH-Semantic and Global-Content Constraints. IEEE Transactions on Cybernetics, Volume 43(4), pp. 1265–1276

Liu, X., Xie, X., Wang, W., 2009. A Projection Clustering Technique based on Projection. Journal of Service Science & Management, Volume 2, pp. 362–367

McCallum, A., Kamal, N., Ungar, L. H., 2000. Ef?cient Clustering of High-dimensional Data Sets with Application to Reference Matching. In: Proceedings of the sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp.169–178

Napoleon, D., Pavalakodi, S., 2011. New Method for Dimensionality Reduction using K-Means Clustering Algorithm for High Dimensional Data Set. International Journal of Computer Applications, Volume 13(7), pp. 41–46

Prasanna, K.M., Kumar, S.P., Narayana, G.S., 2011. A Novel Benchmark K-Means Clustering on Continuous Data. International Journal on Computer Science and Engineering (IJCSE), Volume 3(8), pp. 2974–2977

Savaresi, S.M., Boley, D.L., 2001. On the Performance of Bisecting K-means and PDDP. In: Proceedings of the First SIAM International Conference on Data Mining, pp. 1–14

Sculley, D., 2010. Web-scale K-Means Clustering. In: Proceedings of the 19th International Conference on World Wide Web, pp. 1177–1178

Valarmathie, P., Srinath, M.V., Dinakaran, K., 2009. An Increased Performance of Clustering High Dimensional Data through Dimensionality Reduction Technique. Journal of Theoretical and Applied Information Technology, pp. 731–733

Wagsta, K., Cardie, C., Rogers, S., Schroedgl, S., 2001. Constrained K-means Clustering with Background Knowledge. In: Proceedings of the Eighteenth International Conference on Machine Learning, pp. 577–584

Wu, B., Zhang, Y., Hu, B-G., Ji, Q., 2013. Constrained Clustering and Its

Application to Face Clustering in Videos. IEEE Conference on Computer Vision and Pattern Recognition, pp. 3507–3514

Yip, K.Y., Cheung, D.W., Ng, M.K., 2004. HARP: A Practical Projected Clustering Algorithm. IEEE Transactions on Knowledge and Data Engineering, Volume 16(11), pp.1387–1397

Download PDF

Who cite this paper

Table of Contents

Article

Abstract

References