A centroid based clustering algorithm. The main aim of this algorithm is to minimise the sum of distances between the data point and their corresponding clusters. The input data is unlabelled, so the algorithm divides the data into n number of clusters iteratively until it creates the most optimised clusters.
The algorithm primarily performs two tasks:
– Determines the best value for ‘k’ centroids by an iterative process.
– Then it assigns each data point to its closest ‘k’ center. The data points that are closer to the particular ‘k’ center, forms a cluster.
(As seen in the image)
K-means Clustering – Example:
A pizza chain wants to open its delivery centres across a city. Let’s think of the possible challenges.
- Where is the pizza delivered frequently?
- Number of pizza stores to take care of delivery in that area?
- Locations for the pizza stores within all these areas in order to keep the distance between the store and delivery points minimum?
Understanding these metrics will involve a lot of analysis, statistical analysis and mathematics. Let’s understand how k-means clustering method works.
K-means Clustering Method:
If k is given, the K-means algorithm can be executed in the following steps:
- Parting the objects into k non-empty subsets.
- Identify the cluster centroids of the current partition.
- Assign each point to a specific cluster.
- Compute the distances from each point and allot points to the cluster where the distance from the centroid is minimum.
- After re-allotting the points, find the centroid of the new cluster formed.
The step by step process: