k-means
initially randomly pick k
centroids
- assignment step: Assign each observation to the nearest centroid
- update step: calculate the mean of the cluster as the new centroid
The algorithm has converged when the assignments no longer change.
Can use k-means to find the centroids then use knn to do classification