Given a dataset with cluster labels (ground-truth), it can be hard to determine how well your clustering method is performing on that dataset. Indeed, how do you define how far you are from the ground-truth ? This can be very much problem dependent. Here we introduce a simple Jaccard index measure, combining L0 and L1 norm like to give the user a good idea of how well his method is performing. This method is sensitive to the assignments but also to the number of cluster used, and it thus not monotonic w.r.t the number of clusters inferred.
lkampoli / clustering_distance Goto Github PK
View Code? Open in Web Editor NEWThis project forked from alexandreday/clustering_distance
Distance between clustering assignments. Non-trivial measure weighting L0 and L1 Jaccard norms.