0% found this document useful (0 votes)

12 views

4373 Java Mini Project

Uploaded by

Kamatchi Kartheeban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

4373 Java Mini Project

Uploaded by

Kamatchi Kartheeban

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 24

JAVA MINI PROJECT B.

Sunethra

K-MEANS CLUSTERING 9919004373

Introduction:

Customers for any store either offline or online (that is, e-commerce) all exhibit different
behaviors in terms of buying patterns. Some might buy in bulk, while others might buy
lesser quantities of stuff but the transactions might be spread out throughout the year.
Some might buy big items during festival times like Christmas and so on. Figuring out the
buying patterns of the customers and grouping or segmenting the customers based on their
buying patterns is of the utmost importance for the business owners.

The theme of this project is to develop a java-based system to separate the classes of
customers w.r.t item buying, amount spend and etc by analysing the behaviours of
customer buying pattern using appropriate data set.

Features:

1.Main theme of clustering the customers is to reduce the unwanted or less sold items and
to increase the stock of items which are liked by the customers at most .To do this the
owner or manufacturer should be aware of their customer behavipours and their swinging
moods.To make it easy and to get more profits we perform clustering.so that we find our
best customer group and the items they like most.

2.works for any numeric datasets.

3.The algorithm is simple and easy to implement.

4.By this manufacturer can expect more profit and goods also will be saved.

Some Applications:

1.Market research.

2.Pattern Recognition.

3.Data Analysis.

4.Image processing.
System Analysis:

To start any business we need some capital amount an d a good place some standard
foundation.But to make that business successful and to omit lass and make business
profitable we should aware of customers and their behaviours.

We know that some busimnesses will run in profit in some seasons and some runs with loss
in the same season. While discussing about profit and loss of business one should think
about place and surroundings necessarily.

How one can get benefit and get rid from loss.

Here comes the question that which stack should that entrepreneur purchase to make his
business profit .

To do this ,in DataScience there are two ways u[p to my knowledge

Kmeans clustering or masket basket analysis

In this project I am going to do k means clustering to know the best customer group and
their behavior based on their spending score and incomes and age

By doing clustering industrialist can come to a conclusiomn on which product and on which
group of customers he should invest so that he can get more profit
In this project,I am concentrating on mall customers data set .Based on their
score,incomeand attention to buy goods customers are divided into different groups and by
this that mall owner can come to a conclusion which products should he sell to the
customer groups to get more profit.

System Diagram:
System Description:

K-means clustering

Kmeans algorithm is an iterative algorithm that tries to partition the dataset into Kpre-
defined distinct non-overlapping subgroups (clusters) where each data point belongs to only
one group. It assigns data points to a cluster such that the sum of the

squared distance between the data points and the cluster’s centroid is at the minimum. The
less variation we have within clusters, the more homogeneous the data points are within
the same cluster.
To implement kmeans clustering in java ,

Java provides a package called weka with so many classes and methods with good features.

In this project ,we are going to see the code without weka and only by using simple code
and some collections concepts.

Major steps:

1.Find no of clusters.

2.Choose some random data points as centroids .

3.Find Eucledian distance between every data point and centroid .Assign the datapoint to
the centroid that have least distance from the data point.

4.Update the centroid after adding the data point .

5.Repeat the process until all the data points in the data set completes.

To do this we require four classes

1.To maintain records.

2.To maintain clusters

3.To generate dataset

4.Finally, to implement the kmeans process

Module:

class Record

Initialise all variables required to store dataset

Set setter method for each variable

Set getter method for each variable

Write a method that gives brief description about record

class Cluster

Create variables to store centroid of each variable in record

Set setter method for each centroid variable

Set getter method for each centroid variable

Write a method thar returns info about centroids

Write a function that calculates eucledian distance from data point in record with all
centroids

Write a function that updates centroid of cluster to which the current data point is added

class Dataset

import java.util.io.*

import java.util.Scanner if needed

Take file as input fro user

Create a multidimensional array to store numerical information

Create single dimensional array for each column in file

Write separate methods for each columns that returns all the column context

class KMeans

import java.util.*;

To make Use of collections

ArrayList , Iterator, hashmap

Main method that starts program is written in this class

//Main class of all the classes

Write a function generate record using Record class object create a dataset

Write a function initiateClusterAndCentroid that initializes cluster and centroid

Write a function intitialize cluster to assign cluster with minimum distance

Write a function to print Record Information and clusters information

Dataset used

The dataset used in this project is mall customers dataset of csv format

Customer segmentation is done by choosing age,Annual income and spending score

Downloaded Dataset from kaggle website

Data Science projects website

Sample DatasetUsed:

Code:
public class Record

private int age;

private int income;

private int score;

private int cluster_number;

public Record(int age,int income,int score)

this.age=age;

this.income=income;

this.score=score;

public void setAge()

this.age=age;

public void setIncome()

this.income=income;

public void setScore()

{

this.score=score;

public void setClusterNumber(int cluster_number)

this.cluster_number=cluster_number;

public int getAge()

return age;

public int getIncome()

return age;

public int getScore()

return age;

public String toString()

{
return "Record [age=" + age + ", income=" + income +

", score=" + score + ", clusterNumber" + cluster_number + "]";

public class Cluster

private int age_centroid;

private int income_centroid;

private int score_centroid;

private int cluster_number;

public Cluster(int cluster_number,int age_centroid,int income_centroid,int

score_centroid)

this.cluster_number=cluster_number;

this.age_centroid=age_centroid;

this.income_centroid=income_centroid;

this.score_centroid=score_centroid;

public void setClusterNumber(int cluster_number)

{
this.cluster_number=cluster_number;

public void setAgeCentroid(int age_centroid)

this.age_centroid=age_centroid;

public void setIncomeCentroid(int income_centroid)

this.income_centroid=income_centroid;

public void setScoreCentroid(int score_centroid)

this.score_centroid=score_centroid;

public int getClusterNumber()

return cluster_number;

public int getAgeCentroid()

return age_centroid;

}
public int getIncomeCentroid()

return income_centroid;

public int getScoreCentroid()

return score_centroid;

public String toString()

return "Cluster [ageCentroid=" + age_centroid + ", incomeCentrid=" +

income_centroid +

", scoreCentroid=" + score_centroid + ", clusterNumber" + cluster_number +

"]";

public double calculateDistance(Record record)

return Math.sqrt(Math.pow((getAgeCentroid()-
record.getAge()),2)+Math.pow((getIncomeCentroid()-record.getIncome()),2)+

Math.pow((getScoreCentroid()-record.getScore()),2));

}
public void updateCentroid(Record record)

setAgeCentroid((getAgeCentroid()+record.getAge())/2);

setIncomeCentroid((getIncomeCentroid()+record.getIncome())/2);

setScoreCentroid((getScoreCentroid()+record.getScore())/2);

import java.io.*;

import java.util.*;

class DataSet

String s=null,s1=null;

int rec[][]= new int[201][6];

int i=-1,j=0,count;

int age[]=new int[200];

int income[]=new int[200];

int score[]=new int[200];

public void setColumns()throws Exception

FileReader fr=new FileReader("C:/Users/Dell/Desktop/datasets/Mall_Customers.csv");

BufferedReader br=new BufferedReader(fr);

br.readLine();

while((s=br.readLine())!=null && i<200 )

StringTokenizer st=new StringTokenizer(s,",");

i++;

j=0;

count=0;

while(st.hasMoreTokens() && j<4)

try

s1=st.nextToken();

count++;

if(count!=2)

int n=Integer.parseInt(s1);

rec[i][j]=n;

j++;

}
}catch(Exception e)

System.out.println(e);

for(i=0;i<200;i++)

age[i]=rec[i][1];

for(i=0;i<200;i++)

income[i]=rec[i][2];

for(i=0;i<200;i++)

score[i]=rec[i][3];

}
public int[] getAgeColumn()

return age;

public int[] getIncomeColumn()

return income;

public int[] getScoreColumn()

return score;

public static void main (String[] args)

import java.util.*;

public class KMeans

List<Record> data = new ArrayList<Record>();

List<Cluster> clusters=new ArrayList<Cluster>();

Map<Cluster,List<Record>> clusterRecords=new HashMap<Cluster,List<Record>>();

public static void main(String[] args)throws Exception

int cluster_number = 5;

KMeans km=new KMeans();

km.generateRecord();

km.initiateClusterAndCentroid(cluster_number);

km.printRecordInformation();

km.printClusterInformation();

private void generateRecord()throws Exception

DataSet dataset=new DataSet();

int i;

int a[]=new int[200];

int b[]=new int[200];

int c[]=new int[200];

dataset.setColumns();

a=dataset.getAgeColumn();

b=dataset.getIncomeColumn();
c=dataset.getScoreColumn();

for(i=0;i<200;i++)

Record record=new Record(a[i],b[i],c[i]);

data.add(record);

private void initiateClusterAndCentroid(int cluster_number)

int counter=1;

Iterator<Record> iterator=data.iterator();

Record record=null;

while(iterator.hasNext())

record=iterator.next();

if(counter<=cluster_number)

record.setClusterNumber(counter);

initializeCluster(counter,record);

counter++;
}

else

System.out.println(record);

System.out.println("**CLUSTER INFORMATION**");

for(Cluster cluster:clusters)

System.out.println(cluster);

System.out.println("************************");

double minDistance=Integer.MAX_VALUE;

Cluster whichCluster=null;

for(Cluster cluster:clusters)

double distance=cluster.calculateDistance(record);

if(minDistance>distance)

minDistance=distance;

whichCluster=cluster;

record.setClusterNumber(whichCluster.getClusterNumber());
whichCluster.updateCentroid(record);

clusterRecords.get(whichCluster).add(record);

private void initializeCluster(int cluster_number,Record record)

Cluster cluster=new
Cluster(cluster_number,record.getAge(),record.getIncome(),record.getScore());

clusters.add(cluster);

List<Record> clusterRecord=new ArrayList<Record>();

clusterRecord.add(record);

clusterRecords.put(cluster,clusterRecord);

private void printRecordInformation()

System.out.println("Each Record Information");

for(Record record:data)

System.out.println(record);

private void printClusterInformation()

{

System.out.println("Final Cluster Information");

for(Map.Entry<Cluster,List<Record>> entry:clusterRecords.entrySet())

System.out.println("key= " + entry.getKey() + "value= " + entry.getValue());

Sample Output:
A image of clusters from tableau

Conclusions:

Cluster-1:

This group comprises of customers with less annual income and spending score is also less.

Cluster-2:

This group comprises of customers with more annual income and high spending score. Their
spending score is higher than their annual income.

Cluster-3:

This group comprises of customers with high annual income and low spending score.

Cluster-4:

This group comprises of customers with high annual income and high spending score.

Cluster-5:

This group comprises of customers with medium annual income and spending score is in
proportion with annual income.
References:

1.Advances in k means clustering article by springer

2.A history of k means clustering algorithms article

3.From website Data Science projects

4.sewell,Grandville and p.J.Rousseau-Finding groups in data:An introduction to cluster

analysis

5.The elements of statistical learning:Data mining,inference and prediction by Trevor hastie

6.elements of statistical learning Book by Robert Tibshirani and Jerome Friedman

7.Data minining by Duda and Hart

8.Some lectures towards DataScience and Java in youtube

Big Data TD 3 - Problem Set 3
No ratings yet
Big Data TD 3 - Problem Set 3
19 pages
Portfolio Task 3 - Running Records
No ratings yet
Portfolio Task 3 - Running Records
4 pages
Visual Arts - Scheme of Work
No ratings yet
Visual Arts - Scheme of Work
9 pages
我愛你
No ratings yet
我愛你
8 pages
3. Chapter 5 CLUSTERING
No ratings yet
3. Chapter 5 CLUSTERING
36 pages
Customer_segmentation
No ratings yet
Customer_segmentation
43 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
Clustering_activity_K_Means+(age+and+amount)
No ratings yet
Clustering_activity_K_Means+(age+and+amount)
10 pages
Data Mining Assignment No. 1
No ratings yet
Data Mining Assignment No. 1
22 pages
LP I Assignment A4 Clustering
No ratings yet
LP I Assignment A4 Clustering
13 pages
Python Machine Learning
No ratings yet
Python Machine Learning
19 pages
CUSTOMER SEGMENTATION USING ENSEMBLE CLUSTERING
No ratings yet
CUSTOMER SEGMENTATION USING ENSEMBLE CLUSTERING
20 pages
3469891
No ratings yet
3469891
17 pages
SQLDM - Implementing K-Means Clustering Using SQL: Jay B.Simha
No ratings yet
SQLDM - Implementing K-Means Clustering Using SQL: Jay B.Simha
5 pages
DATA_MINING_UNIT-4
No ratings yet
DATA_MINING_UNIT-4
15 pages
DBSCAN Algorithm Java Implementation
No ratings yet
DBSCAN Algorithm Java Implementation
12 pages
Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning
No ratings yet
Customer Categorization by Data Analysis Using Clustering Algorithms of Machine Learning
4 pages
DWM Practical
No ratings yet
DWM Practical
12 pages
Cps 8210 Assignment 2
No ratings yet
Cps 8210 Assignment 2
3 pages
Clustering Part-1
No ratings yet
Clustering Part-1
48 pages
DWM Experiment5 E059
No ratings yet
DWM Experiment5 E059
15 pages
Data Mining
No ratings yet
Data Mining
27 pages
Software Design Simplified
From Everand
Software Design Simplified
Liviu Catalin Dorobantu
No ratings yet
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
8 pages
UNIT-5-ML
No ratings yet
UNIT-5-ML
38 pages
Case Study-1: Department of Computer Science and Engineering (7 Semester)
No ratings yet
Case Study-1: Department of Computer Science and Engineering (7 Semester)
16 pages
Week 9 - Clustering
No ratings yet
Week 9 - Clustering
63 pages
Java Program
No ratings yet
Java Program
10 pages
Customer Segmentation Report
No ratings yet
Customer Segmentation Report
31 pages
Java Implementation of Simple Pagerank Algorithm
No ratings yet
Java Implementation of Simple Pagerank Algorithm
10 pages
Unit 4
No ratings yet
Unit 4
4 pages
AdityaGaur BDA Exp8
No ratings yet
AdityaGaur BDA Exp8
4 pages
ExploratoryDataAnalysis
No ratings yet
ExploratoryDataAnalysis
6 pages
20bcs087 Akhil Kholia
No ratings yet
20bcs087 Akhil Kholia
28 pages
Clustering Algorithms SciKit Learn 1705740354
No ratings yet
Clustering Algorithms SciKit Learn 1705740354
22 pages
Artificial Intelligence: Semester Project
No ratings yet
Artificial Intelligence: Semester Project
7 pages
A Discretization Method For Industrial Data Based On Big Data Technology
No ratings yet
A Discretization Method For Industrial Data Based On Big Data Technology
3 pages
BIL Report
No ratings yet
BIL Report
24 pages
Assignment 4 28855
No ratings yet
Assignment 4 28855
3 pages
ML Exp5 C36
No ratings yet
ML Exp5 C36
18 pages
Artificial Intelligence Report
No ratings yet
Artificial Intelligence Report
23 pages
Title: K-Means Clustering Algorithm Implementation: Department of Computer Science and Engineering
No ratings yet
Title: K-Means Clustering Algorithm Implementation: Department of Computer Science and Engineering
7 pages
Lab Manual 3-2 - Done
No ratings yet
Lab Manual 3-2 - Done
16 pages
DA_EXP_10_66
No ratings yet
DA_EXP_10_66
6 pages
DA_EXP_10 (1)
No ratings yet
DA_EXP_10 (1)
6 pages
A Paper With 12pt Global Font Size
No ratings yet
A Paper With 12pt Global Font Size
13 pages
ML_Lec-16
No ratings yet
ML_Lec-16
16 pages
Parvezjava
No ratings yet
Parvezjava
20 pages
Objectives of Clustering
No ratings yet
Objectives of Clustering
3 pages
Model Question paper 2
No ratings yet
Model Question paper 2
7 pages
Cluster Analysis: Prepared by Mr. C Y Nimkar 1
No ratings yet
Cluster Analysis: Prepared by Mr. C Y Nimkar 1
77 pages
Assignment No - 1
No ratings yet
Assignment No - 1
11 pages
Evaluating Student's Performance Using K-Means Clustering: Rakesh Kumar Arora, Dr. Dharmendra Badal
No ratings yet
Evaluating Student's Performance Using K-Means Clustering: Rakesh Kumar Arora, Dr. Dharmendra Badal
5 pages
Module 5
No ratings yet
Module 5
370 pages
Clustering
No ratings yet
Clustering
80 pages
Core Java Programming Book
From Everand
Core Java Programming Book
Manish Soni
No ratings yet
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
OOPS Assignment No 2
No ratings yet
OOPS Assignment No 2
12 pages
ADL LAB Manual
No ratings yet
ADL LAB Manual
27 pages
Aiml Project Review
No ratings yet
Aiml Project Review
22 pages
CSUDS Project
No ratings yet
CSUDS Project
13 pages
K MEANS
No ratings yet
K MEANS
40 pages
Advanced Data Structure_Unit 1
No ratings yet
Advanced Data Structure_Unit 1
61 pages
k-d trees
No ratings yet
k-d trees
19 pages
Unit 1
No ratings yet
Unit 1
111 pages
Advanced-algorithm-lecture-notes
No ratings yet
Advanced-algorithm-lecture-notes
83 pages
9919004002(event handling)
No ratings yet
9919004002(event handling)
33 pages
Unit 1 and 2
No ratings yet
Unit 1 and 2
87 pages
Phishing Case Study
No ratings yet
Phishing Case Study
5 pages
Red-black Trees Operation
No ratings yet
Red-black Trees Operation
32 pages
CA, CSIT Final year Degree Analysis
No ratings yet
CA, CSIT Final year Degree Analysis
15 pages
Recurrences
No ratings yet
Recurrences
9 pages
Assignmnent - Case Study
No ratings yet
Assignmnent - Case Study
2 pages
2.4.3 1682587387 9795
No ratings yet
2.4.3 1682587387 9795
19 pages
Unit 5
No ratings yet
Unit 5
70 pages
Unit 4 Concurrency Control
No ratings yet
Unit 4 Concurrency Control
111 pages
213cse4309 - It Data Security Course Plan
No ratings yet
213cse4309 - It Data Security Course Plan
15 pages
DAA - Part B&C
No ratings yet
DAA - Part B&C
3 pages
Unit 5 E-Database Transaction
No ratings yet
Unit 5 E-Database Transaction
111 pages
Appraisal Guidelines 2023 Final
No ratings yet
Appraisal Guidelines 2023 Final
8 pages
Performance Assessment of Surface Modified Natural Fibre Using Naoh in Composite Concrete
No ratings yet
Performance Assessment of Surface Modified Natural Fibre Using Naoh in Composite Concrete
29 pages
IT Infrastructure Landscape Overview SE 1
No ratings yet
IT Infrastructure Landscape Overview SE 1
8 pages
Integrity and Security
No ratings yet
Integrity and Security
58 pages
Application Design and Development
No ratings yet
Application Design and Development
52 pages
Lecture Materials
No ratings yet
Lecture Materials
1 page
Application Development and Administration
No ratings yet
Application Development and Administration
71 pages
Intermediate SQL
No ratings yet
Intermediate SQL
52 pages
Relational Database Design
No ratings yet
Relational Database Design
85 pages
4039 Event Handling
No ratings yet
4039 Event Handling
28 pages
Geotechnical Problem Solving: Book Description
No ratings yet
Geotechnical Problem Solving: Book Description
9 pages
High Speed Propulsion Systems
No ratings yet
High Speed Propulsion Systems
14 pages
BPE Set1
No ratings yet
BPE Set1
64 pages
WinXP Installation CD Slipstream For Ebox-3300
No ratings yet
WinXP Installation CD Slipstream For Ebox-3300
16 pages
BOOK Chapter 6 New International Business English - STS' MAIN COURSE BOOK
No ratings yet
BOOK Chapter 6 New International Business English - STS' MAIN COURSE BOOK
16 pages
D2T T2 PDF
No ratings yet
D2T T2 PDF
11 pages
Luxe Buzz
No ratings yet
Luxe Buzz
5 pages
Aldine First Language Book For Grades TH
No ratings yet
Aldine First Language Book For Grades TH
297 pages
Maths-April-QP & Memo-KZN-2021-Gr12
No ratings yet
Maths-April-QP & Memo-KZN-2021-Gr12
18 pages
Banking - Economy PDF - June 2020 by AffairsCloud
No ratings yet
Banking - Economy PDF - June 2020 by AffairsCloud
136 pages
Refraction Questions
100% (2)
Refraction Questions
18 pages
CONGA - Coding Test - Automation
No ratings yet
CONGA - Coding Test - Automation
2 pages
ADNOC
No ratings yet
ADNOC
3 pages
SANDVIK CH895 - Spare Parts Catalog
No ratings yet
SANDVIK CH895 - Spare Parts Catalog
84 pages
Cs Unit 2 - Time Domains
No ratings yet
Cs Unit 2 - Time Domains
66 pages
Concept Map - Mathematical Structure
No ratings yet
Concept Map - Mathematical Structure
10 pages
Leetcode DSA Sheet by Fraz
No ratings yet
Leetcode DSA Sheet by Fraz
23 pages
Ceiling Mount Occupancy Sensor: Features
No ratings yet
Ceiling Mount Occupancy Sensor: Features
2 pages
Lab Report No 8
No ratings yet
Lab Report No 8
11 pages
Philip Agee - Wikipedia
No ratings yet
Philip Agee - Wikipedia
36 pages
The Poorhouse Subsidized Housing in Chicago 2nd ed Edition Devereux Bowly Jr. - The ebook with rich content is ready for you to download
100% (1)
The Poorhouse Subsidized Housing in Chicago 2nd ed Edition Devereux Bowly Jr. - The ebook with rich content is ready for you to download
61 pages
4 Normal Form: By: Karen Mcvay
No ratings yet
4 Normal Form: By: Karen Mcvay
25 pages
How To Use Bodhee Prep's VARC Course:: in Addition To This We Have "CAT Recorded Sessions"
No ratings yet
How To Use Bodhee Prep's VARC Course:: in Addition To This We Have "CAT Recorded Sessions"
2 pages
Book Review of Rural Elite 160412
0% (1)
Book Review of Rural Elite 160412
20 pages
Flat Specifications
No ratings yet
Flat Specifications
4 pages
Chapt2-Transport Level Security-SSL, TLS
No ratings yet
Chapt2-Transport Level Security-SSL, TLS
29 pages
CATHY Accomplishment REPORT New
No ratings yet
CATHY Accomplishment REPORT New
22 pages
Importance of Phonotactics
No ratings yet
Importance of Phonotactics
2 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.