An Analysis of Enhancement in K-Means Clustering
Keywords:
algorithm, Regression, Clustering, Data miningAbstract
Today in modern era, everyone has to retrieve the large amount of data from a vast collection of data. This process of retrieving useful data in understandable form is data mining. Big data[6] is a term for data sets that are so large or complex that old data processing applications are insufficient. Accuracy in big data might lead to more confident decision making, & better decisions could result in greater operational efficiency, cost reduction & reduced risk. Various algorithms[5] & techniques like Classification, Regression, Artificial Intelligence, Neural Networks, Association Rules, Decision Trees, Algorithm, Nearest Neighbour approach are used for knowledge discovery from databases. Clustering is an important data analytic technique which have a significant role in data mining application. Clustering is the technique of arranging a set of similar objects into a group. Partition based clustering is an important clustering technique. This technique is centroid based technique in which data points splits into k partition and each partition represents a cluster. A widely used partition based clustering algorithm is k- means clustering algorithm. But this method has problem of empty cluster. The problems could be reduced by using an enhanced algorithm. In this paper, we have analysis of the old k-means algorithm and an enhanced k-means algorithm.
References
Piatetsky-Shapiro, Gregory (1991), Discovery, analysis, & presentation of strong rules, in Piatetsky-Shapiro,
Gregory; & Frawley, William J.; eds., Knowledge Discovery in Databases, AAAI/MIT Press, Cambridge, MA.
Agrawal, R.; Imieliński, T.; Swami, A. (1993). "Mining association rules between sets of items in large databases". Proceedings of 1993 ACM SIGMOD
international conference on Management of data - SIGMOD '93. p. 207. doi:10.1145/170035.170072. ISBN 0897915925.
Hahsler, Michael (2005). "Introduction to arules – A computational environment for mining association rules & frequent item sets" (PDF). Journal of Statistical Software.
Michael Hahsler (2015). A Probabilistic Comparison of Commonly Used Interest Measures for Association Rules. http://michael.hahsler.net/research/association_rules/measures.html
Hipp, J.; Güntzer, U.; Nakhaeizadeh, G. (2000). "Algorithms for association rule mining --- a general survey & comparison". ACM SIGKDD Explorations Newsletter 2: 58. doi:10.1145/360402.360421.
Tan, Pang-Ning; Michael, Steinbach; Kumar, Vipin (2005). "Chapter 6. Association Analysis: Basic Concepts & Algorithms" (PDF). Introduction to Data Mining. Addison-Wesley. ISBN 0-321-32136-7.
Pei, Jian; Han, Jiawei; & Lakshmanan, Laks V. S.; Mining frequent itemsets with convertible constraints, in Proceedings of 17th International Conference on Data Engineering, April 2–6, 2001, Heidelberg, Germany, 2001, pages 433-442
Agrawal, Rakesh; & Srikant, Ramakrishnan; Fast algorithms for mining association rules in large databases, in Bocca, Jorge B.; Jarke, Matthias; & Zaniolo, Carlo; editors, Proceedings of 20th International Conference on Very Large Data Bases (VLDB), Santiago, Chile, September 1994, pages 487-499
Zaki, M. J. (2000). "Scalable algorithms for association mining". IEEE Transactions on Knowledge & Data Engineering 12 (3): 372–390. doi:10.1109/69.846291.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2016 International Journal for Research Publication and Seminar
This work is licensed under a Creative Commons Attribution 4.0 International License.
Re-users must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use. This license allows for redistribution, commercial and non-commercial, as long as the original work is properly credited.