Tumgik
k-means · 3 years
Text
Running a k-means Cluster Analysis
In this case is proposed a simple analysis to show how k-means works.
The dataset used is an isotropic Gaussian blobs for clustering generated by sklearn as follows:
Tumblr media Tumblr media
This dataset has 500 samples, 5 features, and 5 clusters as we indicate in the script. Altough we know the number of clusters, we would suppose that we don´t, so we are going to make all the process to discover it.
As the dataset used ( isotropic Gaussian blobs for clustering generated by sklearn) is not complex and don´t have many points, it won´t be used test sets.
The code used is the following:
Tumblr media Tumblr media
As its seen in the picture, the elbow method tell us that the number of clusters should be 4 (as we alreadu known). So we are goind to work with this number.
Tumblr media
Where plot_labelled_scatter refers to another .py file that helps us to make the plot with the following function:
Tumblr media
The plot is:
Tumblr media
As it is shown in the picture, there are 4 clusters well defined.
This example was not complex in order to give a better understanding of kmeans.
Thanks!
0 notes