The k-means problem consists of finding groups of points such that the intra-group variance is minimized, that is, minimizing the sum of the squared distances of each point to the center closest to it.
The exact algorithm is as follows:
-
Choose a center from among the data points using a uniform random variable about dataset.
-
For each point x, calculate D (x), which is the distance between x and the nearest center that has already been selected.
-
Choose a new random point (with uniform random variable) as the new center, using a weighted probability distribution where a point x is chosen with probability proportional to D (x) 2.
-
Repeat steps 2 and 3 until k centers are selected, n iterations.
-
Now that the initial centers have been chosen, continue using standard k-means clustering.