Software implementation of the main cluster analysis tools

Andrey V. Silin; Olga N. Grinyuk; Tatyana A. Lartseva; Olga V. Aleksashina; Tatiana S. Sukhova

doi:10.34069/AI/2021.47.11.9

Vol. 10 Núm. 47 (2021)

Artículos

Software implementation of the main cluster analysis tools

PDF (English) HTML (English) EPUB (English)

Andrey V. Silin,
Olga N. Grinyuk,
Tatyana A. Lartseva,
Olga V. Aleksashina,
Tatiana S. Sukhova

más información

Andrey V. Silin
Mendeleev University of Chemical Technology of Russia, Novomoskovsk, Russia.
Biografía del autor/a

PhD in Technical Sciences, Associate Professor, Novomoskovsk Institute of D. Mendeleev University of Chemical Technology of Russia, Novomoskovsk, Russia.

Olga N. Grinyuk
Mendeleev University of Chemical Technology of Russia, Novomoskovsk, Russia.
Biografía del autor/a

Head of the Career Guidance Center, Novomoskovsk Institute of D. Mendeleev University of Chemical Technology of Russia, Novomoskovsk, Russia.

Tatyana A. Lartseva
Moscow Polytechnic University, Moscow, Russia.
Biografía del autor/a

Senior Lecturer, Moscow Polytechnic University, Moscow, Russia.

Olga V. Aleksashina
Moscow Polytechnic University, Moscow, Russia.
Biografía del autor/a

PhD in Technical Sciences, Associate Professor, Moscow Polytechnic University, Moscow, Russia.

Tatiana S. Sukhova
Moscow Aviation Institute (National Research University), Moscow, Russia.
Biografía del autor/a

PhD in Technical Sciences, Associate Professor, Moscow Aviation Institute (National Research University), Moscow, Russia.

DOI: https://doi.org/10.34069/AI/2021.47.11.9

Publicado 2021-12-17

Palabras clave

сluster analysis, cluster, distance function between vectors, dendrogram, matrix, divergence measure, method of grouping a set of objects.

Cómo citar

Silin, A. V., Grinyuk, O. N., Lartseva, T. A., Aleksashina, O. V., & Sukhova, T. S. (2021). Software implementation of the main cluster analysis tools. Amazonia Investiga, 10(47), 81–92. https://doi.org/10.34069/AI/2021.47.11.9

Resumen

This article discusses an approach to creating a complex of programs for the implementation of cluster analysis methods. A number of cluster analysis tools for processing the initial data set and their software implementation are analyzed, as well as the complexity of the application of cluster data analysis. An approach to data is generalized from the point of view of factual material that supplies information for the problem under study and is the basis for discussion, analysis and decision-making. Cluster analysis is a procedure that combines objects or variables into groups based on a given rule. The work provides a grouping of multivariate data using proximity measures such as sample correlation coefficient and its module, cosine of the angle between vectors and Euclidean distance. The authors proposed a method for grouping by centers, by the nearest neighbor and by selected standards.

The results can be used by analysts in the process of creating a data analysis structure and will improve the efficiency of clustering algorithms. The practical significance of the results of the application of the developed algorithms is expressed in the software package created by means of the C ++ language in the VS environment.

PDF (English) HTML (English) EPUB (English)

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Agresti, A. (2018). An introduction to categorical data analysis. Hoboken: John Wiley & Sons.

Ayvazyan, S. A., Buchstaber, V. M., Enyukov, I. S., and Meshalkin, L. D. (1989). Applied statistics: classification and dimensionality reduction. Moscow: Finance and Statistics.

Bellman, R. E., & Dreyfus, S. E. (2015). Applied dynamic programming. Princeton: Princeton University press. Available at https://www.degruyter.com/document/doi/10.1515/9781400874651/html

Berkhin, P. (2006). A survey of clustering data mining techniques. In Grouping multidimensional data. Berlin: Springer.

Borovikov, V. (2019). Statistica: The Art of Computer Data Analysis: For Professionals. St. Petersburg: Peter.

Bruce, P., and Bruce, A. (2018). Practical statistics for Data Scientists. St. Petersburg: BHV-Petersburg.

Bureeva, N. N. (2007). Multivariate statistical analysis using "STATISTICA". Nizhny Novgorod: Lobachevsky State University of Nizhny Novgorod.

Durand, B. S., and Odell, P. L. (1974). Cluster analysis: a Survey. Berlin: Springer-Verlag. Available at https://link.springer.com/book/10.1007/978-3-642-46309-9

Gitis, L. Kh. (2017). Statistical classification and cluster analysis. Moscow: Gornaya kniga.

Gubler, E. V., and Genkin, A. A. (1973). Application of nonparametric criteria for statistics in biomedical research. Leningrad: Medicine. Available at https://www.elibrary.ru/item.asp?id=30097631

Jambu, M. (1988). Hierarchical Cluster Analysis and Compliance. Moscow: Finance and Statistics. Available at https://booksee.org/book/793639

Kim, J. O., & Mueller, C. W. (1978). Factor analysis: Statistical methods and practical issues. New York: SAGE University Paper.

Kliger, S. A., Kosolapov, M. S., & Tolstova, Yu. N. (1978). Scaling in the collection and analysis of sociological information. Moscow: Science. Available at http://www.sociologos.ru/upload/File/Shkalirovanie_pri_sbore.pdf

Kochetkov, E. S., Smerchinskaya, S. O., and Sokolov, V. V. (2008). Probability theory and mathematical statistics: textbook. Moscow: Forum. Available at https://www.elibrary.ru/item.asp?id=19455004

Krasnikovsky, V. Ya. (2021). Statistical processing of sociological research data by means of the SPSS program. Moscow: Prometey.

Larose, D. T. (2015). Data mining and predictive analytics. New York: John Wiley & Sons.

Leskovec, J., Rajaraman, A., & Ullman, J. D. (2020). Mining of massive data sets. Cambridge: Cambridge university press.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Hung Byers, A. (2011). Big data: The next frontier for innovation, competition, and productivity. New York: McKinsey Global Institute.

Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt.

Nasledov, A. D. (2013). IBM SPSS Statistics 20 and AMOS: Professional Statistical Data Analysis. St. Petersburg: Piter.

Paklin, N. (2020). Clustering Algorithms in the Data Mining Service. Loginom. Available at https://loginom.ru/blog/data-mining-clustering

Russell, M., & Klassen, M. (2020). Data mining. Extract information from Facebook, Twitter, LinkedIn, Instagram, GitHub. St. Petersburg: Piter. Available at https://www.bookvoed.ru/book?id=10195967

Soshnikova, L. A., Tamashevich, V. N., Uebe, G., and Shefer, M. (1999). Multidimensional Statistical Analysis in Economics. Moscow: Unity. Available at https://ru.djvu.online/file/xQofXt7maWWN5

Van Ryzin, J. (1977). Classification and Clustering. Wisconsin: University of Wisconsin. Available at https://library.wur.nl/WebQuery/titel/302610

Software implementation of the main cluster analysis tools

Palabras clave

Cómo citar

Resumen

Descargas

Citas

Información de la Revista

Pautas

Políticas

Contáctanos

Software implementation of the main cluster analysis tools

Palabras clave

Cómo citar

Descargar cita

Resumen

Descargas

Citas