Software implementation of the main cluster analysis tools

Keywords: сluster analysis, cluster, distance function between vectors, dendrogram, matrix, divergence measure, method of grouping a set of objects.

Abstract

This article discusses an approach to creating a complex of programs for the implementation of cluster analysis methods. A number of cluster analysis tools for processing the initial data set and their software implementation are analyzed, as well as the complexity of the application of cluster data analysis. An approach to data is generalized from the point of view of factual material that supplies information for the problem under study and is the basis for discussion, analysis and decision-making. Cluster analysis is a procedure that combines objects or variables into groups based on a given rule. The work provides a grouping of multivariate data using proximity measures such as sample correlation coefficient and its module, cosine of the angle between vectors and Euclidean distance. The authors proposed a method for grouping by centers, by the nearest neighbor and by selected standards.

The results can be used by analysts in the process of creating a data analysis structure and will improve the efficiency of clustering algorithms. The practical significance of the results of the application of the developed algorithms is expressed in the software package created by means of the C ++ language in the VS environment.

Downloads

Download data is not yet available.

Author Biographies

Andrey V. Silin, Mendeleev University of Chemical Technology of Russia, Novomoskovsk, Russia.

PhD in Technical Sciences, Associate Professor, Novomoskovsk Institute of D. Mendeleev University of Chemical Technology of Russia, Novomoskovsk, Russia.

Olga N. Grinyuk, Mendeleev University of Chemical Technology of Russia, Novomoskovsk, Russia.

Head of the Career Guidance Center, Novomoskovsk Institute of D. Mendeleev University of Chemical Technology of Russia, Novomoskovsk, Russia.

Tatyana A. Lartseva, Moscow Polytechnic University, Moscow, Russia.

Senior Lecturer, Moscow Polytechnic University, Moscow, Russia.

Olga V. Aleksashina, Moscow Polytechnic University, Moscow, Russia.

PhD in Technical Sciences, Associate Professor, Moscow Polytechnic University, Moscow, Russia.

Tatiana S. Sukhova, Moscow Aviation Institute (National Research University), Moscow, Russia.

PhD in Technical Sciences, Associate Professor, Moscow Aviation Institute (National Research University), Moscow, Russia.

References

Agresti, A. (2018). An introduction to categorical data analysis. Hoboken: John Wiley & Sons.

Ayvazyan, S. A., Buchstaber, V. M., Enyukov, I. S., and Meshalkin, L. D. (1989). Applied statistics: classification and dimensionality reduction. Moscow: Finance and Statistics.

Bellman, R. E., & Dreyfus, S. E. (2015). Applied dynamic programming. Princeton: Princeton University press. Available at https://www.degruyter.com/document/doi/10.1515/9781400874651/html

Berkhin, P. (2006). A survey of clustering data mining techniques. In Grouping multidimensional data. Berlin: Springer.

Borovikov, V. (2019). Statistica: The Art of Computer Data Analysis: For Professionals. St. Petersburg: Peter.

Bruce, P., and Bruce, A. (2018). Practical statistics for Data Scientists. St. Petersburg: BHV-Petersburg.

Bureeva, N. N. (2007). Multivariate statistical analysis using "STATISTICA". Nizhny Novgorod: Lobachevsky State University of Nizhny Novgorod.

Durand, B. S., and Odell, P. L. (1974). Cluster analysis: a Survey. Berlin: Springer-Verlag. Available at https://link.springer.com/book/10.1007/978-3-642-46309-9

Gitis, L. Kh. (2017). Statistical classification and cluster analysis. Moscow: Gornaya kniga.

Gubler, E. V., and Genkin, A. A. (1973). Application of nonparametric criteria for statistics in biomedical research. Leningrad: Medicine. Available at https://www.elibrary.ru/item.asp?id=30097631

Jambu, M. (1988). Hierarchical Cluster Analysis and Compliance. Moscow: Finance and Statistics. Available at https://booksee.org/book/793639

Kim, J. O., & Mueller, C. W. (1978). Factor analysis: Statistical methods and practical issues. New York: SAGE University Paper.

Kliger, S. A., Kosolapov, M. S., & Tolstova, Yu. N. (1978). Scaling in the collection and analysis of sociological information. Moscow: Science. Available at http://www.sociologos.ru/upload/File/Shkalirovanie_pri_sbore.pdf

Kochetkov, E. S., Smerchinskaya, S. O., and Sokolov, V. V. (2008). Probability theory and mathematical statistics: textbook. Moscow: Forum. Available at https://www.elibrary.ru/item.asp?id=19455004

Krasnikovsky, V. Ya. (2021). Statistical processing of sociological research data by means of the SPSS program. Moscow: Prometey.

Larose, D. T. (2015). Data mining and predictive analytics. New York: John Wiley & Sons.

Leskovec, J., Rajaraman, A., & Ullman, J. D. (2020). Mining of massive data sets. Cambridge: Cambridge university press.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Hung Byers, A. (2011). Big data: The next frontier for innovation, competition, and productivity. New York: McKinsey Global Institute.

Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt.

Nasledov, A. D. (2013). IBM SPSS Statistics 20 and AMOS: Professional Statistical Data Analysis. St. Petersburg: Piter.

Paklin, N. (2020). Clustering Algorithms in the Data Mining Service. Loginom. Available at https://loginom.ru/blog/data-mining-clustering

Russell, M., & Klassen, M. (2020). Data mining. Extract information from Facebook, Twitter, LinkedIn, Instagram, GitHub. St. Petersburg: Piter. Available at https://www.bookvoed.ru/book?id=10195967

Soshnikova, L. A., Tamashevich, V. N., Uebe, G., and Shefer, M. (1999). Multidimensional Statistical Analysis in Economics. Moscow: Unity. Available at https://ru.djvu.online/file/xQofXt7maWWN5

Van Ryzin, J. (1977). Classification and Clustering. Wisconsin: University of Wisconsin. Available at https://library.wur.nl/WebQuery/titel/302610
Published
2021-12-17
How to Cite
Silin, A. V., Grinyuk, O. N., Lartseva, T. A., Aleksashina, O. V., & Sukhova, T. S. (2021). Software implementation of the main cluster analysis tools. Amazonia Investiga, 10(47), 81-92. https://doi.org/10.34069/AI/2021.47.11.9
Section
Articles