This repository presents a comparative analysis of running the most popular clustering algorithms implemented in python on the same generate dataset. Its goal is to provide an overview of both the heuristics and the results of these algorithms but also to serve as a cheat-sheet.
The main notebook is accompanied by this presentation containing links and screenshots from various courses/videos/posts. Most of the content was structured around the Coursera's Data mining and clustering course.
List of the included algorithms:
- K-means
- K-medoids
- AgglomerativeClustering
- CURE
- BIRCH
- DBSCAN
- HDBSCAN
- OPTICS
- Canopy clustering
- GMM
- Bayesian Gaussian Mixture
- Mean Shift
- Linkage
Methods for selecting the optimal number of clusters:
- elbow method
- silhouette method
- AIC/BIC