0% found this document useful (0 votes)

15 views19 pages

24ucs172 S6

The document discusses algorithm design with a focus on data binning, a technique used in data analysis and machine learning to condense numerical values into smaller intervals or 'bins.' It outlines various binning methods such as equal-width, equal-frequency, custom, k-means, and quantile binning, highlighting their advantages and disadvantages. Binning helps reduce noise, manage data, and enhance visualization, but it can also lead to loss of information and sensitivity to outliers.

Uploaded by

jayapratibha.r

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views19 pages

24ucs172 S6

Uploaded by

jayapratibha.r

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

24UCS172

Computational Thinking

Unit - III –
ALGORITHM DESIGN
UNIT III - ALGORITHM DESIGN

Algorithmic Thinking Algorithm and Flowcharting, Name

binding, Selection, Repetition Filtering,
Pseudocode, Finding max and min, AND, OR operator, Binning
BINNING
• In data analysis and machine learning, we employ a crucial data preprocessing
technique: binning, also known as bucketing.
• This method involves the condensation of numerous numerical values into a smaller
quantity of “bins” or “buckets.”
• Each bin signifies an exclusive value interval; correspondingly, every datum point
finds its assignment in one particular bin according to where it lies within the range of
values.
BINNING
• Binned data primarily serves to mitigate the impact of minor observation errors
• This is accomplished by amalgamating nearby values, thereby ameliorating small
data fluctuations that could either be random noise or inconsequential details.
• Binning process simplifies the dataset.
• As a result, discerning trends and patterns in the data – especially within the context
of visual representation – are rendered more manageable.
Data Binning Techniques

• The type of data and the particular needs for analysis decide where to use each
method.
• It is important that we choose carefully so that our binning gives us an accurate and
useful understanding, which means picking the right way to do it.
• Different binning methods provide a range of benefits; each one is particularly well-
suited for specific kinds of data features and goals in analysis.
Types of Binning
Data Binning
Equal-width Binning
• Equal-width Binning– without complexity in implementation and with utility for
evenly distributed data – splits the range of our dataset into intervals that are of
identical size.
• A practical example would be choosing five bins within a 0-100 data range, where
each bin covers an interval precisely measuring 20 units (0-20, 21-40, etc.).
• Equal-width binning, however, may exhibit sensitivity to outliers: these anomalous
data points can provoke an uneven distribution across the bins – a circumstance that
introduces skew into our analysis.
Equal-frequency Binning

• Constructs bins to contain approximately the same number of data points.

• This method proves advantageous for unevenly distributed data, guaranteeing a
balanced binning structure.
• It is notably effective in mitigating outliers that are, this way, less likely to dominate
any single bin.
• However, the ranges of bins can vary widely, potentially complicating the
interpretation of results.
Equal Width and Equal Frequency
Custom Binning

• Custom binning depends on using specialized knowledge to create intervals for

data.

• Take educational data analysis as an instance, where you might define certain score

intervals to label performance levels like ‘Fail,’ ‘Pass,’ ‘Merit,’ and ‘Distinction.’

• This technique serves the special situation of the data very well; it provides deep

understanding but requires a strong grasp of its area.

K-means Binning

• The sophisticated method of k-means binning uses a clustering algorithm to

determine the ranges of bins.
• It separates the data into k groups, where each one is a different bin.
• This flexible approach adjusts very well to the basic pattern in the dataset we have
and shows better performance for more complicated datasets.
K Means
Quantile Binning

• In Quantile Binning, we divide the data into bins: each bin holds an equal number of
data points – a process akin to equal-frequency binning.
• However, there is a key distinction.
• Quantile binning concentrates specifically on the data’s distribution, making it
optimal for two purposes: creating percentile groups and normalizing data.
Advantages of Data Binning

Reduces Noise: By grouping similar data points, it actively reduces the noise. This
process smoothes out minor fluctuations in the data, potentially due to random
variation or noise.
Facilitates Data Management: This process significantly reduces the computational
load – achieves this through a decreased number of distinct values that require
processing; not only simplifies calculations but also accelerates data analysis tasks.
Advantages of Data Binning
Handling Missing Data: Data binning can indeed aid in handling missing data, serving
to categorize the available information into bins.
Eases Categorical Analysis: Binning continuous data into discrete intervals eases
categorical analysis
Enhances Data Visualization: Histograms, for instance – through the utilization of
binning – offer a comprehensive overview of data distribution, thus simplifying the
process of drawing meaningful insights.
Control Outliers: Certain binning techniques, such as equal-frequency binning, actively
distribute data points among bins; this action mitigates the impact of outliers.
Disadvantages of Data Binning
Loss of Information: When we put data into bigger groups, sometimes we miss small
but important details.
Unable to Pick the Right Method: The choice of binning method can sway analysis
results – there’s no one-size-fits-all solution. Incorrectly applying this strategy may
lead to misleading conclusions.
Inconsistency Across Different Datasets: Different datasets might have binning
parameters that do not match well, even if the datasets look similar.
Disadvantages of Data Binning
Sensitivity to Outliers in Equal-width Binning: These extreme values can cause the
distribution in bins to tilt a lot, making the data representation not even and maybe
giving wrong impressions.
Arbitrary Bin Boundaries: The randomness in choosing values sometimes adds bias-a
very important element that changes how we understand what comes out of our
analysis.

2 Binning Techniques in Data Mining With Examples
No ratings yet
2 Binning Techniques in Data Mining With Examples
10 pages
Knowledge Discovery Database - Unit 2
No ratings yet
Knowledge Discovery Database - Unit 2
53 pages
Frequency Analysis of Hydrological Extreme Events
100% (2)
Frequency Analysis of Hydrological Extreme Events
12 pages
03 Data Preparation
No ratings yet
03 Data Preparation
41 pages
Lec 6 Data Preprocessing Using R
No ratings yet
Lec 6 Data Preprocessing Using R
84 pages
Binning 1
No ratings yet
Binning 1
3 pages
Unit 3
No ratings yet
Unit 3
36 pages
Week2 2
No ratings yet
Week2 2
25 pages
C3 - Displaying Data
No ratings yet
C3 - Displaying Data
44 pages
Binning or Discretization
No ratings yet
Binning or Discretization
9 pages
04 DM BI Data Preprocessing
No ratings yet
04 DM BI Data Preprocessing
93 pages
Chapter4 Clustering
No ratings yet
Chapter4 Clustering
77 pages
R20 DMT Unit-Ii
No ratings yet
R20 DMT Unit-Ii
17 pages
02 Pre Processing
No ratings yet
02 Pre Processing
52 pages
W2-Data Preparation
No ratings yet
W2-Data Preparation
46 pages
Data Integration and Binning
No ratings yet
Data Integration and Binning
4 pages
Lecture 5
No ratings yet
Lecture 5
27 pages
Unit 2
No ratings yet
Unit 2
34 pages
Unit - 1 Data Preprocessing
No ratings yet
Unit - 1 Data Preprocessing
66 pages
03 Data Preparation
No ratings yet
03 Data Preparation
28 pages
Unit-1 3
No ratings yet
Unit-1 3
58 pages
Data Preprocessing Steps 2
No ratings yet
Data Preprocessing Steps 2
26 pages
DWDM Lecture PPT Unit3 Part3
No ratings yet
DWDM Lecture PPT Unit3 Part3
29 pages
Data Discretization
No ratings yet
Data Discretization
32 pages
Chapter3 DataPreprocessing
No ratings yet
Chapter3 DataPreprocessing
50 pages
Data Binning
No ratings yet
Data Binning
9 pages
Data Preprocessing Techniques
No ratings yet
Data Preprocessing Techniques
11 pages
CH2 Data Cleaning
No ratings yet
CH2 Data Cleaning
41 pages
Python Pandas ch-2
No ratings yet
Python Pandas ch-2
56 pages
05 DS Data Preprocessing - Cleaning
No ratings yet
05 DS Data Preprocessing - Cleaning
14 pages
Data Pre Processing
No ratings yet
Data Pre Processing
11 pages
Business Statistics
No ratings yet
Business Statistics
20 pages
Statistics and Probability
No ratings yet
Statistics and Probability
59 pages
Feature Engineering
No ratings yet
Feature Engineering
35 pages
Lecture 5 # Effective Data Denoising Techniques
No ratings yet
Lecture 5 # Effective Data Denoising Techniques
18 pages
Decile and Percentile
No ratings yet
Decile and Percentile
22 pages
3-Data Pre-Processing
No ratings yet
3-Data Pre-Processing
18 pages
DM-2Preprocessing 2
No ratings yet
DM-2Preprocessing 2
61 pages
Slide 2 - Data Preprocessing
100% (1)
Slide 2 - Data Preprocessing
39 pages
Measures of Central Tendency Position
No ratings yet
Measures of Central Tendency Position
12 pages
Lesson 5
No ratings yet
Lesson 5
20 pages
Week 4 - 5 - Data Preprocessing
No ratings yet
Week 4 - 5 - Data Preprocessing
67 pages
Bias Correction Capabilities of Quantile Mapping Methods For Rainfall and Temperature Variables
No ratings yet
Bias Correction Capabilities of Quantile Mapping Methods For Rainfall and Temperature Variables
19 pages
Binning
No ratings yet
Binning
4 pages
Normalization 05032024 010758pm
No ratings yet
Normalization 05032024 010758pm
17 pages
Lec2 - Data Preprocessing
No ratings yet
Lec2 - Data Preprocessing
30 pages
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
No ratings yet
Data Preparation: KIT306/606: Data Analytics A/Prof. Quan Bai University of Tasmania
49 pages
Wilcox Functions
No ratings yet
Wilcox Functions
117 pages
CHAPTER 3 Statistic
No ratings yet
CHAPTER 3 Statistic
15 pages
Data Preprocessing 013333
No ratings yet
Data Preprocessing 013333
8 pages
4 Binning
No ratings yet
4 Binning
19 pages
02 Data Warehouse
No ratings yet
02 Data Warehouse
18 pages
DWM Exp6 C49
No ratings yet
DWM Exp6 C49
15 pages
Exp 5
No ratings yet
Exp 5
11 pages
Unit-2 Lecture Notes
No ratings yet
Unit-2 Lecture Notes
33 pages
Unit 4 Ppttsa
No ratings yet
Unit 4 Ppttsa
19 pages
EMAG MAT105 Mock Questions 2019-2020
No ratings yet
EMAG MAT105 Mock Questions 2019-2020
5 pages
4 - Finding and Fixing Data Quality Issues
No ratings yet
4 - Finding and Fixing Data Quality Issues
48 pages
Unit 4 Data Management JGDomingo
No ratings yet
Unit 4 Data Management JGDomingo
17 pages
Detection of Prediction Outliers and Inliers in Multivariate Calibration
No ratings yet
Detection of Prediction Outliers and Inliers in Multivariate Calibration
19 pages
Conditional Value-At-Risk: Aspects of Modeling and Estimation
No ratings yet
Conditional Value-At-Risk: Aspects of Modeling and Estimation
22 pages
Knowledge Discovery and Data Mining
No ratings yet
Knowledge Discovery and Data Mining
55 pages
Entropy Discretization
No ratings yet
Entropy Discretization
20 pages
Final - Unit 3 Data Preprocessing - Phases
No ratings yet
Final - Unit 3 Data Preprocessing - Phases
42 pages
Survival Analysis - Lecture 5
No ratings yet
Survival Analysis - Lecture 5
69 pages
Qreg
No ratings yet
Qreg
104 pages
Math10 q4 Post Test
No ratings yet
Math10 q4 Post Test
5 pages
Haberman Datasets Analysis - Ipynb - Colaboratory
No ratings yet
Haberman Datasets Analysis - Ipynb - Colaboratory
13 pages
Data Preprocessing - Data Cleaning
100% (2)
Data Preprocessing - Data Cleaning
29 pages
Package Msstats': March 1, 2022
No ratings yet
Package Msstats': March 1, 2022
59 pages
Percentile, Quartile and Fractile
No ratings yet
Percentile, Quartile and Fractile
4 pages
OpenEBGM: An R Implementation of The Gamma-Poisson Shrinker Data Mining Model
No ratings yet
OpenEBGM: An R Implementation of The Gamma-Poisson Shrinker Data Mining Model
21 pages
Journal of Experimental Child Psychology: Dimona Bartelet, Anniek Vaessen, Leo Blomert, Daniel Ansari
No ratings yet
Journal of Experimental Child Psychology: Dimona Bartelet, Anniek Vaessen, Leo Blomert, Daniel Ansari
17 pages
Biostatistics Measures of Central Tendency
No ratings yet
Biostatistics Measures of Central Tendency
3 pages
Earnings Surprises, Growth Expectations, and Stock Returns or Don't Let An Earnings Torpedo Sink Your Portfolio
No ratings yet
Earnings Surprises, Growth Expectations, and Stock Returns or Don't Let An Earnings Torpedo Sink Your Portfolio
24 pages
Wong and Wei 2018
No ratings yet
Wong and Wei 2018
16 pages
Data Reduction
No ratings yet
Data Reduction
22 pages
24ucs172 S2
No ratings yet
24ucs172 S2
25 pages
24ucs172 S3
No ratings yet
24ucs172 S3
20 pages
Aravind Rangamreddy 500195259 cs4
No ratings yet
Aravind Rangamreddy 500195259 cs4
3 pages
Assignment 02
No ratings yet
Assignment 02
7 pages
Lani Dailey 301906 7: MAP Growth Student Summary Spring 2023-2024 Student Name: Local ID: Grade
No ratings yet
Lani Dailey 301906 7: MAP Growth Student Summary Spring 2023-2024 Student Name: Local ID: Grade
4 pages
Montgomery Fleet Equipment Inventory FA PART 2 END
No ratings yet
Montgomery Fleet Equipment Inventory FA PART 2 END
6 pages
24UCS172 Computational Thinking: Unit - 1 Data Systems Fundamentals
No ratings yet
24UCS172 Computational Thinking: Unit - 1 Data Systems Fundamentals
14 pages
24UCS172 Computational Thinking: Unit - 1 Data Systems Fundamentals
No ratings yet
24UCS172 Computational Thinking: Unit - 1 Data Systems Fundamentals
21 pages
Montgomery Fleet Equipment Inventory FA PART 1 END
No ratings yet
Montgomery Fleet Equipment Inventory FA PART 1 END
2 pages
24ucs172 S5
No ratings yet
24ucs172 S5
15 pages
Data Pre Processing - NG
No ratings yet
Data Pre Processing - NG
43 pages
Cpe105 - CC Quantiles Group Data Handouts
No ratings yet
Cpe105 - CC Quantiles Group Data Handouts
3 pages
E-Tivity 2.2 Tharcisse 217010849
No ratings yet
E-Tivity 2.2 Tharcisse 217010849
7 pages
24ucs172 S4
No ratings yet
24ucs172 S4
9 pages
Instant Download Quantitative Investment Analysis, 4th Edition Cfa Institute PDF All Chapter
100% (1)
Instant Download Quantitative Investment Analysis, 4th Edition Cfa Institute PDF All Chapter
54 pages
24UCS172 Computational Thinking: Unit - 1 Data Systems Fundamentals
No ratings yet
24UCS172 Computational Thinking: Unit - 1 Data Systems Fundamentals
10 pages
Data Discretization
No ratings yet
Data Discretization
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

24ucs172 S6

Uploaded by

24ucs172 S6

Uploaded by

24UCS172

Algorithmic Thinking Algorithm and Flowcharting, Name

• Constructs bins to contain approximately the same number of data points.

• Custom binning depends on using specialized knowledge to create intervals for

understanding but requires a strong grasp of its area.

• The sophisticated method of k-means binning uses a clustering algorithm to

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.