0% found this document useful (0 votes)
12 views

4373 Java Mini Project

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

4373 Java Mini Project

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 24

JAVA MINI PROJECT B.

Sunethra

K-MEANS CLUSTERING 9919004373

Introduction:

Customers for any store either offline or online (that is, e-commerce) all exhibit different
behaviors in terms of buying patterns. Some might buy in bulk, while others might buy
lesser quantities of stuff but the transactions might be spread out throughout the year.
Some might buy big items during festival times like Christmas and so on. Figuring out the
buying patterns of the customers and grouping or segmenting the customers based on their
buying patterns is of the utmost importance for the business owners.

The theme of this project is to develop a java-based system to separate the classes of
customers w.r.t item buying, amount spend and etc by analysing the behaviours of
customer buying pattern using appropriate data set.

Features:

1.Main theme of clustering the customers is to reduce the unwanted or less sold items and
to increase the stock of items which are liked by the customers at most .To do this the
owner or manufacturer should be aware of their customer behavipours and their swinging
moods.To make it easy and to get more profits we perform clustering.so that we find our
best customer group and the items they like most.

2.works for any numeric datasets.

3.The algorithm is simple and easy to implement.

4.By this manufacturer can expect more profit and goods also will be saved.

Some Applications:

1.Market research.

2.Pattern Recognition.

3.Data Analysis.

4.Image processing.
System Analysis:

To start any business we need some capital amount an d a good place some standard
foundation.But to make that business successful and to omit lass and make business
profitable we should aware of customers and their behaviours.

We know that some busimnesses will run in profit in some seasons and some runs with loss
in the same season. While discussing about profit and loss of business one should think
about place and surroundings necessarily.

How one can get benefit and get rid from loss.

Here comes the question that which stack should that entrepreneur purchase to make his
business profit .

To do this ,in DataScience there are two ways u[p to my knowledge

Kmeans clustering or masket basket analysis

In this project I am going to do k means clustering to know the best customer group and
their behavior based on their spending score and incomes and age

By doing clustering industrialist can come to a conclusiomn on which product and on which
group of customers he should invest so that he can get more profit
In this project,I am concentrating on mall customers data set .Based on their
score,incomeand attention to buy goods customers are divided into different groups and by
this that mall owner can come to a conclusion which products should he sell to the
customer groups to get more profit.

System Diagram:
System Description:

K-means clustering

Kmeans algorithm is an iterative algorithm that tries to partition the dataset into Kpre-
defined distinct non-overlapping subgroups (clusters) where each data point belongs to only
one group. It assigns data points to a cluster such that the sum of the

squared distance between the data points and the cluster’s centroid is at the minimum. The
less variation we have within clusters, the more homogeneous the data points are within
the same cluster.
To implement kmeans clustering in java ,

Java provides a package called weka with so many classes and methods with good features.

In this project ,we are going to see the code without weka and only by using simple code
and some collections concepts.

Major steps:

1.Find no of clusters.

2.Choose some random data points as centroids .

3.Find Eucledian distance between every data point and centroid .Assign the datapoint to
the centroid that have least distance from the data point.

4.Update the centroid after adding the data point .

5.Repeat the process until all the data points in the data set completes.

To do this we require four classes

1.To maintain records.

2.To maintain clusters

3.To generate dataset

4.Finally, to implement the kmeans process

Module:

class Record

Initialise all variables required to store dataset

Set setter method for each variable

Set getter method for each variable

Write a method that gives brief description about record

class Cluster

Create variables to store centroid of each variable in record


Set setter method for each centroid variable

Set getter method for each centroid variable

Write a method thar returns info about centroids

Write a function that calculates eucledian distance from data point in record with all
centroids

Write a function that updates centroid of cluster to which the current data point is added

class Dataset

import java.util.io.*

import java.util.Scanner if needed

Take file as input fro user

Create a multidimensional array to store numerical information

Create single dimensional array for each column in file

Write separate methods for each columns that returns all the column context

class KMeans

import java.util.*;

To make Use of collections

ArrayList , Iterator, hashmap

Main method that starts program is written in this class

//Main class of all the classes

Write a function generate record using Record class object create a dataset

Write a function initiateClusterAndCentroid that initializes cluster and centroid

Write a function intitialize cluster to assign cluster with minimum distance


Write a function to print Record Information and clusters information

Dataset used

The dataset used in this project is mall customers dataset of csv format

Customer segmentation is done by choosing age,Annual income and spending score

Downloaded Dataset from kaggle website

Data Science projects website

Sample DatasetUsed:

Code:
public class Record

private int age;

private int income;

private int score;

private int cluster_number;

public Record(int age,int income,int score)

this.age=age;

this.income=income;

this.score=score;

public void setAge()

this.age=age;

public void setIncome()

this.income=income;

public void setScore()


{

this.score=score;

public void setClusterNumber(int cluster_number)

this.cluster_number=cluster_number;

public int getAge()

return age;

public int getIncome()

return age;

public int getScore()

return age;

public String toString()

{
return "Record [age=" + age + ", income=" + income +

", score=" + score + ", clusterNumber" + cluster_number + "]";

public class Cluster

private int age_centroid;

private int income_centroid;

private int score_centroid;

private int cluster_number;

public Cluster(int cluster_number,int age_centroid,int income_centroid,int


score_centroid)

this.cluster_number=cluster_number;

this.age_centroid=age_centroid;

this.income_centroid=income_centroid;

this.score_centroid=score_centroid;

public void setClusterNumber(int cluster_number)

{
this.cluster_number=cluster_number;

public void setAgeCentroid(int age_centroid)

this.age_centroid=age_centroid;

public void setIncomeCentroid(int income_centroid)

this.income_centroid=income_centroid;

public void setScoreCentroid(int score_centroid)

this.score_centroid=score_centroid;

public int getClusterNumber()

return cluster_number;

public int getAgeCentroid()

return age_centroid;

}
public int getIncomeCentroid()

return income_centroid;

public int getScoreCentroid()

return score_centroid;

public String toString()

return "Cluster [ageCentroid=" + age_centroid + ", incomeCentrid=" +


income_centroid +

", scoreCentroid=" + score_centroid + ", clusterNumber" + cluster_number +


"]";

public double calculateDistance(Record record)

return Math.sqrt(Math.pow((getAgeCentroid()-
record.getAge()),2)+Math.pow((getIncomeCentroid()-record.getIncome()),2)+

Math.pow((getScoreCentroid()-record.getScore()),2));

}
public void updateCentroid(Record record)

setAgeCentroid((getAgeCentroid()+record.getAge())/2);

setIncomeCentroid((getIncomeCentroid()+record.getIncome())/2);

setScoreCentroid((getScoreCentroid()+record.getScore())/2);

import java.io.*;

import java.util.*;

class DataSet

String s=null,s1=null;

int rec[][]= new int[201][6];

int i=-1,j=0,count;

int age[]=new int[200];

int income[]=new int[200];

int score[]=new int[200];

public void setColumns()throws Exception

FileReader fr=new FileReader("C:/Users/Dell/Desktop/datasets/Mall_Customers.csv");


BufferedReader br=new BufferedReader(fr);

br.readLine();

while((s=br.readLine())!=null && i<200 )

StringTokenizer st=new StringTokenizer(s,",");

i++;

j=0;

count=0;

while(st.hasMoreTokens() && j<4)

try

s1=st.nextToken();

count++;

if(count!=2)

int n=Integer.parseInt(s1);

rec[i][j]=n;

j++;

}
}catch(Exception e)

System.out.println(e);

for(i=0;i<200;i++)

age[i]=rec[i][1];

for(i=0;i<200;i++)

income[i]=rec[i][2];

for(i=0;i<200;i++)

score[i]=rec[i][3];

}
public int[] getAgeColumn()

return age;

public int[] getIncomeColumn()

return income;

public int[] getScoreColumn()

return score;

public static void main (String[] args)

import java.util.*;

public class KMeans

List<Record> data = new ArrayList<Record>();

List<Cluster> clusters=new ArrayList<Cluster>();


Map<Cluster,List<Record>> clusterRecords=new HashMap<Cluster,List<Record>>();

public static void main(String[] args)throws Exception

int cluster_number = 5;

KMeans km=new KMeans();

km.generateRecord();

km.initiateClusterAndCentroid(cluster_number);

km.printRecordInformation();

km.printClusterInformation();

private void generateRecord()throws Exception

DataSet dataset=new DataSet();

int i;

int a[]=new int[200];

int b[]=new int[200];

int c[]=new int[200];

dataset.setColumns();

a=dataset.getAgeColumn();

b=dataset.getIncomeColumn();
c=dataset.getScoreColumn();

for(i=0;i<200;i++)

Record record=new Record(a[i],b[i],c[i]);

data.add(record);

private void initiateClusterAndCentroid(int cluster_number)

int counter=1;

Iterator<Record> iterator=data.iterator();

Record record=null;

while(iterator.hasNext())

record=iterator.next();

if(counter<=cluster_number)

record.setClusterNumber(counter);

initializeCluster(counter,record);

counter++;
}

else

System.out.println(record);

System.out.println("**CLUSTER INFORMATION**");

for(Cluster cluster:clusters)

System.out.println(cluster);

System.out.println("************************");

double minDistance=Integer.MAX_VALUE;

Cluster whichCluster=null;

for(Cluster cluster:clusters)

double distance=cluster.calculateDistance(record);

if(minDistance>distance)

minDistance=distance;

whichCluster=cluster;

record.setClusterNumber(whichCluster.getClusterNumber());
whichCluster.updateCentroid(record);

clusterRecords.get(whichCluster).add(record);

private void initializeCluster(int cluster_number,Record record)

Cluster cluster=new
Cluster(cluster_number,record.getAge(),record.getIncome(),record.getScore());

clusters.add(cluster);

List<Record> clusterRecord=new ArrayList<Record>();

clusterRecord.add(record);

clusterRecords.put(cluster,clusterRecord);

private void printRecordInformation()

System.out.println("****Each Record Information****");

for(Record record:data)

System.out.println(record);

private void printClusterInformation()


{

System.out.println("*****Final Cluster Information*****");

for(Map.Entry<Cluster,List<Record>> entry:clusterRecords.entrySet())

System.out.println("key= " + entry.getKey() + "value= " + entry.getValue());

Sample Output:
A image of clusters from tableau

Conclusions:

Cluster-1:

This group comprises of customers with less annual income and spending score is also less.

Cluster-2:

This group comprises of customers with more annual income and high spending score. Their
spending score is higher than their annual income.

Cluster-3:

This group comprises of customers with high annual income and low spending score.

Cluster-4:

This group comprises of customers with high annual income and high spending score.

Cluster-5:

This group comprises of customers with medium annual income and spending score is in
proportion with annual income.
References:

1.Advances in k means clustering article by springer


2.A history of k means clustering algorithms article

3.From website Data Science projects

4.sewell,Grandville and p.J.Rousseau-Finding groups in data:An introduction to cluster


analysis

5.The elements of statistical learning:Data mining,inference and prediction by Trevor hastie

6.elements of statistical learning Book by Robert Tibshirani and Jerome Friedman

7.Data minining by Duda and Hart

8.Some lectures towards DataScience and Java in youtube

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy