Click here to Skip to main content
11,638,173 members (81,075 online)
Click here to Skip to main content

Data Clustering Simulation in Python and PyGame

, 14 Jun 2012 CPOL 7.3K 248 5
Rate this:
Please Sign up or sign in to vote.
Clustering of 2D data Using Python and simulation in PyGame

Introduction

 

In this article I will explain the implementation of K-Mean algorithm which is being used in Machine Learning. In the above figures left is unclustered data where as on the right is clustered in 10 clusters. For this I have created two files

1)pyDataCluster.py 

2) clusterSimulation.py

File 1 contains an implementation class of K-Means and File 2 is a simulation file written with pyGame(a game library for python). pyDataCluster class returns the clustered data so data can be viewed in console too.  

Background 

Machine Learning is an advanced step in AI. Instead of creating complex algorithm, simple algorithm are used with large amount of previous data to get the optimized results. This process is base of Learning Algorithm. Clustering is process where data is grouped in classes. To group the data different parameters can be employed depending upon the situation. In K-Means algorithm we cluster the in groups by using the mean values of each Cluster. Which is computed by taking raw data and then processing it repeatedly until mean is not stable. 

Basic Work Flow 

The basic work Flow is as follow: 

1)Get the data 

2)Set the number of clusters you want

3)Create an empty 2d array to store the clustered data 

4)For each Cluster get a random point value which will serve as initial means 

5)For each point calculate the distance with respect to mean 

6) Put the point in cluster with minimum distance 

7)Recalculate the means for every cluster and update the means

8)Use this updated mean to step 5, repeat until mean from two consecutive repetitions become equal.        

Using the code 

Lets look at the code. 

Firstly the clustering class:

To use this class in your code do this:

from pyDataCluster import *

data=[]
groups=10
for i in range(5000):
    data.append([random.randint(1,500),random.randint(1,500)])
    
cluster = pyDataCluster(groups,data)   

This will randomly initialize the data and will create an object named "cluster" with 10 groups and data array.

finalCluster = cluster.finalCluster() # return the final cluster
clus = cluster.createCluster() # will return a cluster but not final  

Initialization:

The class constructor will initialize the class variable.

def __init__(self,numberOfCluster,Data,initialPoints=[]):
        '''
        Constructor
        '''
        self.Kgroups=numberOfCluster
        self.Data=Data
        self.Cluster=[]
        self.Kmeans=initialPoints
        self.initialMeanPositions()
        self.terminat=True

Either pass the initial points or leave it initialMeanPositons() will initialize this for you.

Create Cluster:

def createCluster(self):
        self.clusterSpace()
        for i in self.Data:
            point=[i[0],i[1]]
            group=self.getClusterGroup(point)
            self.Cluster[group].append(i)
        self.setMeans()
        return(self.Cluster)

This function is the work Horse of the class. It will create the clusters of data on the given mean points. Repeatedly calling this function on the given data will result in better clusters.

Final Cluster:

To get the final cluster this will do the job

 def finalCluster(self):
        while self.terminat:
            clus=self.createCluster()
        return(clus)  

This function just go in a loop until termination signal is not give by the "setMeans" function

setMeans:

To set the mean this function will do the job as said in basic work flow:

def setMeans(self):
        means=[]
        x=0
        y=0
        for i in self.Cluster:
            for j in i:
                x=x+j[0]
                y=y+j[1]
            means.append([math.floor(x/len(i)),math.floor(y/len(i))])
            x=0
            y=0
        if(self.Kmeans==means):
            self.terminat=False
        self.Kmeans=[]
        self.Kmeans=means

Assigning the Cluster Group:

This function will return the group index where a given point is belong:

def getClusterGroup(self,point):
        dist=[]
        for i in self.Kmeans:
            dist.append(math.fabs(point[0]-i[0])+math.fabs(point[1]-i[1]))
        minIndex = dist.index(min(dist))
        return minIndex

Empty Cluster:

For every run you will need an empty cluster this function will flush the old values if any and create an empty one:

 def clusterSpace(self):
        self.Cluster=[]
        for i in range(self.Kgroups):
            self.Cluster.append([])

Up to this the Clustering is completed and now the Simulation Part.

clusterSimulation:

This require the PyGame library which can be Downloaded from there Site.

import pygame, sys, time
from pygame.locals import *
from pyDataCluster import *

data=[]
groups=10
for i in range(5000):
    data.append([random.randint(1,500),random.randint(1,500)])
    
cluster = pyDataCluster(groups,data)
Color=[]
for i in range(groups):
    
    while True:
        cl=((random.randint(0,255)),(random.randint(0,255)),(random.randint(0,255)))
        if cl not in Color:   
            Color.append(cl)
            break
pygame.init()
WINDOWWIDTH = 500
WINDOWHEIGHT = 500
BASICFONT = pygame.font.Font('freesansbold.ttf',50)
windowSurface = pygame.display.set_mode((WINDOWWIDTH, WINDOWHEIGHT), 0, 32)
pygame.display.set_caption('Cluster Simulation')

BLACK = (0, 0, 0)
RED = (255, 0, 0)
GREEN = (0, 255, 0)
BLUE = (0, 0, 255)
WHITE=(255,255,255)


while cluster.terminat:
        points=[]
        
        clus=cluster.createCluster()
        a=0
        for i in clus:
            
            for j in i:
                points.append({'rect':pygame.Rect(j[0],j[1],4,4),'color':Color[a]})
            a=a+1
        for p in points:        
                pygame.draw.rect(windowSurface, p['color'], p['rect'])
        pygame.display.update()
    #time.sleep(0.05)
  

while True:
    # check for the QUIT event
    for event in pygame.event.get():
        if event.type == QUIT:
            pygame.quit()
            sys.exit()  

Try changing the data amount and groups to see the effects. 

    

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Share

About the Author

Ghazanfar_Ali
Pakistan Pakistan
Started Software and Web Development in 2010 at CEME NUST Pakistan. Interested in Artificial Intelligence, Web Technologies and Software development using popular platforms and languages.

You may also be interested in...

Comments and Discussions

 
-- There are no messages in this forum --
| Advertise | Privacy | Terms of Use | Mobile
Web03 | 2.8.150728.1 | Last Updated 14 Jun 2012
Article Copyright 2012 by Ghazanfar_Ali
Everything else Copyright © CodeProject, 1999-2015
Layout: fixed | fluid