4373 Java Mini Project
4373 Java Mini Project
Sunethra
Introduction:
Customers for any store either offline or online (that is, e-commerce) all exhibit different
behaviors in terms of buying patterns. Some might buy in bulk, while others might buy
lesser quantities of stuff but the transactions might be spread out throughout the year.
Some might buy big items during festival times like Christmas and so on. Figuring out the
buying patterns of the customers and grouping or segmenting the customers based on their
buying patterns is of the utmost importance for the business owners.
The theme of this project is to develop a java-based system to separate the classes of
customers w.r.t item buying, amount spend and etc by analysing the behaviours of
customer buying pattern using appropriate data set.
Features:
1.Main theme of clustering the customers is to reduce the unwanted or less sold items and
to increase the stock of items which are liked by the customers at most .To do this the
owner or manufacturer should be aware of their customer behavipours and their swinging
moods.To make it easy and to get more profits we perform clustering.so that we find our
best customer group and the items they like most.
4.By this manufacturer can expect more profit and goods also will be saved.
Some Applications:
1.Market research.
2.Pattern Recognition.
3.Data Analysis.
4.Image processing.
System Analysis:
To start any business we need some capital amount an d a good place some standard
foundation.But to make that business successful and to omit lass and make business
profitable we should aware of customers and their behaviours.
We know that some busimnesses will run in profit in some seasons and some runs with loss
in the same season. While discussing about profit and loss of business one should think
about place and surroundings necessarily.
How one can get benefit and get rid from loss.
Here comes the question that which stack should that entrepreneur purchase to make his
business profit .
In this project I am going to do k means clustering to know the best customer group and
their behavior based on their spending score and incomes and age
By doing clustering industrialist can come to a conclusiomn on which product and on which
group of customers he should invest so that he can get more profit
In this project,I am concentrating on mall customers data set .Based on their
score,incomeand attention to buy goods customers are divided into different groups and by
this that mall owner can come to a conclusion which products should he sell to the
customer groups to get more profit.
System Diagram:
System Description:
K-means clustering
Kmeans algorithm is an iterative algorithm that tries to partition the dataset into Kpre-
defined distinct non-overlapping subgroups (clusters) where each data point belongs to only
one group. It assigns data points to a cluster such that the sum of the
squared distance between the data points and the cluster’s centroid is at the minimum. The
less variation we have within clusters, the more homogeneous the data points are within
the same cluster.
To implement kmeans clustering in java ,
Java provides a package called weka with so many classes and methods with good features.
In this project ,we are going to see the code without weka and only by using simple code
and some collections concepts.
Major steps:
1.Find no of clusters.
3.Find Eucledian distance between every data point and centroid .Assign the datapoint to
the centroid that have least distance from the data point.
5.Repeat the process until all the data points in the data set completes.
Module:
class Record
class Cluster
Write a function that calculates eucledian distance from data point in record with all
centroids
Write a function that updates centroid of cluster to which the current data point is added
class Dataset
import java.util.io.*
Write separate methods for each columns that returns all the column context
class KMeans
import java.util.*;
Write a function generate record using Record class object create a dataset
Dataset used
The dataset used in this project is mall customers dataset of csv format
Sample DatasetUsed:
Code:
public class Record
this.age=age;
this.income=income;
this.score=score;
this.age=age;
this.income=income;
this.score=score;
this.cluster_number=cluster_number;
return age;
return age;
return age;
{
return "Record [age=" + age + ", income=" + income +
this.cluster_number=cluster_number;
this.age_centroid=age_centroid;
this.income_centroid=income_centroid;
this.score_centroid=score_centroid;
{
this.cluster_number=cluster_number;
this.age_centroid=age_centroid;
this.income_centroid=income_centroid;
this.score_centroid=score_centroid;
return cluster_number;
return age_centroid;
}
public int getIncomeCentroid()
return income_centroid;
return score_centroid;
return Math.sqrt(Math.pow((getAgeCentroid()-
record.getAge()),2)+Math.pow((getIncomeCentroid()-record.getIncome()),2)+
Math.pow((getScoreCentroid()-record.getScore()),2));
}
public void updateCentroid(Record record)
setAgeCentroid((getAgeCentroid()+record.getAge())/2);
setIncomeCentroid((getIncomeCentroid()+record.getIncome())/2);
setScoreCentroid((getScoreCentroid()+record.getScore())/2);
import java.io.*;
import java.util.*;
class DataSet
String s=null,s1=null;
int i=-1,j=0,count;
br.readLine();
i++;
j=0;
count=0;
try
s1=st.nextToken();
count++;
if(count!=2)
int n=Integer.parseInt(s1);
rec[i][j]=n;
j++;
}
}catch(Exception e)
System.out.println(e);
for(i=0;i<200;i++)
age[i]=rec[i][1];
for(i=0;i<200;i++)
income[i]=rec[i][2];
for(i=0;i<200;i++)
score[i]=rec[i][3];
}
public int[] getAgeColumn()
return age;
return income;
return score;
import java.util.*;
int cluster_number = 5;
km.generateRecord();
km.initiateClusterAndCentroid(cluster_number);
km.printRecordInformation();
km.printClusterInformation();
int i;
dataset.setColumns();
a=dataset.getAgeColumn();
b=dataset.getIncomeColumn();
c=dataset.getScoreColumn();
for(i=0;i<200;i++)
data.add(record);
int counter=1;
Iterator<Record> iterator=data.iterator();
Record record=null;
while(iterator.hasNext())
record=iterator.next();
if(counter<=cluster_number)
record.setClusterNumber(counter);
initializeCluster(counter,record);
counter++;
}
else
System.out.println(record);
System.out.println("**CLUSTER INFORMATION**");
for(Cluster cluster:clusters)
System.out.println(cluster);
System.out.println("************************");
double minDistance=Integer.MAX_VALUE;
Cluster whichCluster=null;
for(Cluster cluster:clusters)
double distance=cluster.calculateDistance(record);
if(minDistance>distance)
minDistance=distance;
whichCluster=cluster;
record.setClusterNumber(whichCluster.getClusterNumber());
whichCluster.updateCentroid(record);
clusterRecords.get(whichCluster).add(record);
Cluster cluster=new
Cluster(cluster_number,record.getAge(),record.getIncome(),record.getScore());
clusters.add(cluster);
clusterRecord.add(record);
clusterRecords.put(cluster,clusterRecord);
for(Record record:data)
System.out.println(record);
for(Map.Entry<Cluster,List<Record>> entry:clusterRecords.entrySet())
Sample Output:
A image of clusters from tableau
Conclusions:
Cluster-1:
This group comprises of customers with less annual income and spending score is also less.
Cluster-2:
This group comprises of customers with more annual income and high spending score. Their
spending score is higher than their annual income.
Cluster-3:
This group comprises of customers with high annual income and low spending score.
Cluster-4:
This group comprises of customers with high annual income and high spending score.
Cluster-5:
This group comprises of customers with medium annual income and spending score is in
proportion with annual income.
References: