Software implementation of the main cluster analysis tools
DOI:
https://doi.org/10.34069/AI/2021.47.11.9Palabras clave:
сluster analysis, cluster, distance function between vectors, dendrogram, matrix, divergence measure, method of grouping a set of objects.Resumen
This article discusses an approach to creating a complex of programs for the implementation of cluster analysis methods. A number of cluster analysis tools for processing the initial data set and their software implementation are analyzed, as well as the complexity of the application of cluster data analysis. An approach to data is generalized from the point of view of factual material that supplies information for the problem under study and is the basis for discussion, analysis and decision-making. Cluster analysis is a procedure that combines objects or variables into groups based on a given rule. The work provides a grouping of multivariate data using proximity measures such as sample correlation coefficient and its module, cosine of the angle between vectors and Euclidean distance. The authors proposed a method for grouping by centers, by the nearest neighbor and by selected standards.
The results can be used by analysts in the process of creating a data analysis structure and will improve the efficiency of clustering algorithms. The practical significance of the results of the application of the developed algorithms is expressed in the software package created by means of the C ++ language in the VS environment.
Descargas
Citas
Ayvazyan, S. A., Buchstaber, V. M., Enyukov, I. S., and Meshalkin, L. D. (1989). Applied statistics: classification and dimensionality reduction. Moscow: Finance and Statistics.
Bellman, R. E., & Dreyfus, S. E. (2015). Applied dynamic programming. Princeton: Princeton University press. Available at https://www.degruyter.com/document/doi/10.1515/9781400874651/html
Berkhin, P. (2006). A survey of clustering data mining techniques. In Grouping multidimensional data. Berlin: Springer.
Borovikov, V. (2019). Statistica: The Art of Computer Data Analysis: For Professionals. St. Petersburg: Peter.
Bruce, P., and Bruce, A. (2018). Practical statistics for Data Scientists. St. Petersburg: BHV-Petersburg.
Bureeva, N. N. (2007). Multivariate statistical analysis using "STATISTICA". Nizhny Novgorod: Lobachevsky State University of Nizhny Novgorod.
Durand, B. S., and Odell, P. L. (1974). Cluster analysis: a Survey. Berlin: Springer-Verlag. Available at https://link.springer.com/book/10.1007/978-3-642-46309-9
Gitis, L. Kh. (2017). Statistical classification and cluster analysis. Moscow: Gornaya kniga.
Gubler, E. V., and Genkin, A. A. (1973). Application of nonparametric criteria for statistics in biomedical research. Leningrad: Medicine. Available at https://www.elibrary.ru/item.asp?id=30097631
Jambu, M. (1988). Hierarchical Cluster Analysis and Compliance. Moscow: Finance and Statistics. Available at https://booksee.org/book/793639
Kim, J. O., & Mueller, C. W. (1978). Factor analysis: Statistical methods and practical issues. New York: SAGE University Paper.
Kliger, S. A., Kosolapov, M. S., & Tolstova, Yu. N. (1978). Scaling in the collection and analysis of sociological information. Moscow: Science. Available at http://www.sociologos.ru/upload/File/Shkalirovanie_pri_sbore.pdf
Kochetkov, E. S., Smerchinskaya, S. O., and Sokolov, V. V. (2008). Probability theory and mathematical statistics: textbook. Moscow: Forum. Available at https://www.elibrary.ru/item.asp?id=19455004
Krasnikovsky, V. Ya. (2021). Statistical processing of sociological research data by means of the SPSS program. Moscow: Prometey.
Larose, D. T. (2015). Data mining and predictive analytics. New York: John Wiley & Sons.
Leskovec, J., Rajaraman, A., & Ullman, J. D. (2020). Mining of massive data sets. Cambridge: Cambridge university press.
Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Hung Byers, A. (2011). Big data: The next frontier for innovation, competition, and productivity. New York: McKinsey Global Institute.
Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt.
Nasledov, A. D. (2013). IBM SPSS Statistics 20 and AMOS: Professional Statistical Data Analysis. St. Petersburg: Piter.
Paklin, N. (2020). Clustering Algorithms in the Data Mining Service. Loginom. Available at https://loginom.ru/blog/data-mining-clustering
Russell, M., & Klassen, M. (2020). Data mining. Extract information from Facebook, Twitter, LinkedIn, Instagram, GitHub. St. Petersburg: Piter. Available at https://www.bookvoed.ru/book?id=10195967
Soshnikova, L. A., Tamashevich, V. N., Uebe, G., and Shefer, M. (1999). Multidimensional Statistical Analysis in Economics. Moscow: Unity. Available at https://ru.djvu.online/file/xQofXt7maWWN5
Van Ryzin, J. (1977). Classification and Clustering. Wisconsin: University of Wisconsin. Available at https://library.wur.nl/WebQuery/titel/302610