Algorithms
K-means clustering
K-Nearest Neighbors (KNN)
Hierarchical Clustering
DBSCAN
HDBSCAN
<aside>
💡
Don’t mix clstering with Dimensionality Reduction !
</aside>
- K-means clustering : This is a centroid-based algorithm, where the goal is to minimize the sum of distances between points and their respective cluster centroid.
- Hierarchical Clustering : This method creates a tree of clusters. It is subdivided into Agglomerative (bottom-up approach) and Divisive (top-down approach).
- DBSCAN (Density-Based Spatial Clustering of Applications with Noise): This algorithm defines clusters as areas of high density separated by areas of low density.
- Mean Shift Clustering: It is a centroid-based algorithm, which updates candidates for centroids to be the mean of points within a given region.
- Gaussian Mixture Models (GMM): This method uses a probabilistic model to represent the presence of subpopulations within an overall population without requiring to assign each data point to a cluster.
- Spectral Clustering: It uses the eigenvalues of a similarity matrix to reduce dimensionality before applying a clustering algorithm, typically K-means.
- OPTICS (Ordering Points To Identify the Clustering Structure): Similar to DBSCAN, but creates a reachability plot to determine clustering structure.
- Affinity Propagation: It sends messages between pairs of samples until a set of exemplars and corresponding clusters gradually emerges.
- BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies): Designed for large datasets, it incrementally and dynamically clusters incoming multi-dimensional metric data points.
- CURE (Clustering Using Representatives): It identifies clusters by shrinking each cluster to a certain number of representative points rather than the centroid.