0% found this document useful (0 votes)

7 views18 pages

Unit 2

data science unit 2 notes

Uploaded by

AJAY KRISHNA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views18 pages

Unit 2

data science unit 2 notes

Uploaded by

AJAY KRISHNA

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

UNIT – II DESCRIPTIVE ANALYTICS USING STATISTICS 9

Frequency distributions – Outliers –Interpreting Distributions – Graphs – Averages –Describing

Variability – Interquartile Range – Variability For Qualitative And RankedData - Normal
Distributions – Z Scores –Correlation – Scatter Plots – Regression –Regression Line – Least
Squares Regression – Line Standard Error of Estimate –Interpretation of R2 – Multiple
Regression Equations – Regression Toward TheMean.
DESCRIPTIVE ANALYTICS
Descriptive analytics focuses on summarizing and interpreting historical data to gain insights
into events, patterns, and trends in a business. It involves the exploration and examination of data
through various statistical techniques, visualizations, and KPI reports to provide a clear
understanding of what has happened. This provides a foundation for decision-making,
performance evaluation, and operational improvements. It is a method of describing the
characteristics of a data set. It is Useful because they allow making sense of the data and it
includes calculating things such as the average of the data, its spread and the shape it produces.
Qualitative data
Qualitative Variable is a variable that can't be measured in numerical units.The variable which
yield non numerical data.
E.g.- Education, marital status, eye colour.
Frequency
The number of observations falling into particular class/ category of the qualitative variable.
FREQUENCY DISTRIBUTIONS
A frequency distribution is a collection of observations produced by sorting observationsinto
classes and showing their frequency (f) of occurrence in each class.A frequency distribution
helps us to detect any pattern in the data (assuming apattern exists) by superimposing some order
on the inevitable variability amongobservations.
Frequency Distributionfor Grouped Data
A frequency distribution produced whenever observations are sorted into classes of more than
one value.Grouped data refers to the data which is bundled together in different classes or
categories.
Frequency Distributionfor Ungrouped Data
A frequency distribution producedwhenever observations are sortedinto classes of single
values.The ungrouped frequency distribution is a type of frequency distribution that displays the
frequency of each individual data value instead of groups of data values.Ungrouped data or raw
data is a mere list of numbers that does not convey anything. This is because no summarization
or aggregation is possible.

Not Always Appropriate

The frequency distribution shown for ungrouped data is only partially displayed becausethere are
more than 100 possible values between the largest and smallest observations.Frequency
distributions for ungrouped data are much more informative when thenumber of possible values
is less than about 20. Under these circumstances, they are astraightforward method for
organizing data. Otherwise, if there are 20 or more possiblevalues, consider using a frequency
distribution for grouped data.
Example Problem
1. Students in theater arts appreciation class rated the classic film ‘The Wizard of Oz’ on
a 10-point scale, ranging from 1 (poor) to 10 (excellent), as follows: Construct a
frequency distribution for the above data.

Since the number of possible values is relatively small—only 10—it’s appropriate to construct
a frequency distribution for ungrouped data.

Real Limits of Class Intervals

Gaps cannot be ignored when you are determining the actual width of any classinterval. The real
limits are located at the midpoint of the gap between adjacent tabledboundaries; that is, one-half
of one unit of measurement below the lower tabled boundaryand one-half of one unit of
measurement above the upper tabled boundary.
Constructing Frequency Distributions

2. The IQ scores for a group of 35 high school dropouts are as follows

(a) Construct a frequency distribution for grouped data.

(b) Specify the real limits for the lowest class interval in this frequency distribution.
To construct a frequency distribution for grouped data from the given IQ scores, follow
these steps:
1. Sort the IQ scores in ascending order.
2. Determine the range and the number of intervals.
3. Calculate the width of each interval.
4. Create the class intervals.
5. Count the number of IQ scores falling within each interval.
6. Construct the frequency distribution table.

Given the IQ scores:

69,71,75,77,79,80,80,84,85,86,87,89,90,90,90,91,93,94,95,95,96,98,98,99,100,100,103,104,105,
108,109,110,112,123.
1. Sort the IQ scores:
69,71,75,77,79,80,80,84,85,86,87,89,90,90,90,91,93,94,95,95,96,98,98,99,100,100,103,104,1
05,108,109,110,112
2. Determine the range:
Range=Maximum value−Minimum
=123−69=54
3. Decide the number of intervals
Calculate the width of each interval:

=54/10=5.4 Round off to a convenient number, such as 5.

4. Determine the class intervals:
Class intervals
65−69,70−74,75−79,80−84,85−89,90-94,94−99,100−104,105-109,110−114,115—119,120-
124.
5. Construct frequency distribution table
6. The real limits are located at the midpoint of the gap between adjacent tabled boundaries;
that is, one-half of one unit of measurement below the lower tabled boundary and one-half of
one unit of measurement above the upper tabled boundary.
65-0.5=64.5
69-0.5=68.5
The real limits for the lowest class interval 64.5-69.5.
3. What are some possible poor features of the following frequencydistribution?

Not all observations can be assigned to one and only one class (because of gap between
20–22 and 25–30 and overlap between 25–30 and 30–34). All classes are not equal in
width (25–30 versus 30–34). All classes do not have both boundaries (35–above).
Outliers (Very extreme score)
An outlier is an extremely high or extremely low data point relative to the nearest data point and
the rest of the neighboring co-existing values in a data graph or dataset.
Example
The value in the month of January is significantly less than in the other months.

4. Identify any outliers in each of the following sets of data collectedfrom nine college students.

1. Summer Income:
Mean = $7,522.67
Standard Deviation = $8,595.49
Z-scores:
$6,450: -0.123
$4,820: -0.287
$5,650: -0.082
$1,720: -0.785
$600: -0.852
$0: -0.877
$3,482: -0.409
$25,700: 2.106
$8,548: 0.603
Outlier: $25,700 (z-score > 3
2. Family Size:
Mean = 5.00
Standard Deviation = 5.29
Z-scores: 2: -0.377
4: -0.377
3: -0.377
6: 0.377
18: 2.831
2: -0.377
6: 0.377
3: -0.377
4: -0.377
Outlier: 18 (z-score > 3)
4. GPA: Mean = 3.05 Standard Deviation = 0.67 Z-scores: 2.30: -0.948 4.00: 0.840
3.56: 0.573 2.89: -0.802 2.15: -1.275 3.01: -0.694 3.09: -0.662 3.50: -0.134 3.20: -0.510
No outliers.
Therefore, the outliers in the data are:
 Summer Income: $25,700
 Family Size: 18
INTERPRETING DISTRIBUTIONS
In data science, interpreting distributions involves analyzing the patterns and characteristics of
data sets to extract insights and make informed decisions.
GRAPHS
Data can be described clearly and concisely with the aid of a well-constructed frequency
distribution.
Graphs for quantitative data
For visualizing quantitative data, histograms and box plots are commonly used.
Histogram:
A bar-type graph for quantitative data and there are common boundaries between adjacent bars
emphasize the continuity of the data, as withcontinuous variables.A histogram is a graphical
representation of the distribution of numerical data. It consists of a series of bars, where each bar
represents a range of values (bin) and the height of the bar indicates the frequency of data points
falling within that range. Histograms are useful for visualizing the shape, center, and spread of
the data distribution.
Features of histograms
 Equal units along the horizontal axis (the X axis, or abscissa) reflect the various class
intervals of the frequency distribution.
 Equal units along the vertical axis (the Y axis, or ordinate) reflect increases in frequency.
(The units along the vertical axis do not have to be the same width as those along the
horizontal axis.)
 The intersection of the two axes defines the origin at which both numerical scales equal 0.
 Numerical scales always increase from left to right along the horizontal axis and from bottom
to top along the vertical axis.
 The body of the histogram consists of a series of bars whose heights reflect the frequencies
for the various classes. Notice that adjacent bars in histograms have common boundaries that
emphasize the continuity of quantitative data for continuous variables. The introduction of
gaps between adjacent bars would suggest an artificial disruption in the data more
appropriate for discrete quantitative variables or for qualitative variables.
Frequency Polygon
An important variation on a histogram is the frequency polygon, or line graph. Frequency
polygons may be constructed directly from frequency distributions. A line graph for quantitative
datathat also emphasizes the continuityof continuous variables.
Transformation of a histogram into a frequencypolygon
1. Construct a Histogram: Start by creating a histogram to represent the frequency distribution
of the data. Divide the range of the data into intervals (bins) and count the number of data
points falling into each interval.
2. Identify Midpoints and Heights: For each bar in the histogram, identify the midpoint of the
interval and the height of the bar (representing the frequency or relative frequency of data
points in that interval).
3. Plot Points: Plot each midpoint on the horizontal axis, with its corresponding height on the
vertical axis. These points represent the tops of the bars in the histogram.
4. Connect the Points: Connect the points on the graph using straight line segments. Start from
the leftmost point and end at the rightmost point. If you want to emphasize the continuity of
the distribution, you can close the polygon by connecting the last point to the first point.
Example Problem
5. The following frequency distribution shows the annual incomes indollars for a group of
college graduates.

a) Construct a histogram.
b) Construct a frequency polygon.
c) Is this distribution balanced or lopsided?
To determine if the distribution is balanced or lopsided, we typically look at the shape of the
histogram or frequency polygon. In this case, both the histogram and frequency polygon show
that the distribution is lopsided, with more data points concentrated on the left side (lower
income ranges) and fewer data points on the right side (higher income ranges). This suggests that
the distribution is positively skewed, meaning it has a longer tail on the right side. Thus, the
distribution is lopsided or skewed to the right.
6. The number of friends reported by Facebook users is summarized in the following
frequency distribution

a) Convert to a histogram.
b) Why would it not be possible to convert to a stem and leaf display?
It would not be possible to convert this distribution to a stem and leaf display because stem
and leaf plots is typically used for smaller datasets. In this case, you have 200 data points (the
number of users in each frequency category), which would make a stem and leaf plot impractical
and challenging to interpret. Stem and leaf plots are more suitable for datasets with fewer data
points to show the distribution of values in a compact and readable form.
StemandLeafDisplays
Still another technique for summarizing quantitative data is a stem and leaf display.Stem and leaf
displays are ideal for summarizing distributions, such as that for weightdata, without destroying
the identities of individual observations.
Selection of Stems
Stem values are not limited to units of 10. Depending on the data, identify the stem with one or
more leading digits that culminates in some variation on a stem value of 10, such as 1, 100, 1000,
or even .1, .01, .001, and so on.
7. Construct stem and leave display from the statistics:
The stem represents the tens digit of the weight.
The leaves represent the units digit of the weight.

8. Construct a stem and leaf display for the following IQ scores obtained from a group of
four-year-old children
A GRAPH FOR QUALITATIVE (NOMINAL) DATA
For qualitative (nominal) data, a bar graph is often used to represent the frequency or count of
each category.
Bar Graph
Gaps between adjacent bars emphasize the discontinuous nature of the data. A bar graph, also
known as a bar chart, is a graphical representation of data where the length or height of bars
corresponds to the frequency, count, or other numerical measures of different categories or
groups.
9. Construct a bar graph for the data shown in the following table:
AVERAGES
Averages consist of numbers (or words) about which the data are, in some sense, centered. They
are often referred to as measures of central tendency, the several types of average yield numbers
or words that attempt to describe, most generally, the middle or typical value for a distribution. It
focuses on three different measures of central tendency—the mode, median, and mean. Each of
these has its special uses, but the mean is the most important average in both descriptive and
inferential statistics. It is a measure used in statistics to summarize a set of data points.
MODE
The mode reflects the value of the most frequently occurring score.
More Than One Mode
Distributions can have more than one mode (or no mode at all). Distributions with two obvious
peaks, even though they are not exactly the same height, are referred to as bimodal. Distributions
with more than two peaks are referred to as multimodal. The presence of more than one mode
might reflect important differences among subsets of data. For instance, the distribution of
weights for both male and female statistics students would most likely be bimodal, reflecting the
combination of two separate weight distributions—a heavier one for males and a lighter one for
females.
10. Determine the mode for the following retirement ages: 60, 63, 45, 63, 65, 70, 55, 63, 60,
65, 63.
The retirement age 63 appears most frequently, occurring 4 times. So, the mode for this set of
retirement ages is 63.
11. The owner of a new car conducts six gas mileage tests and obtains the following results,
expressed in miles per gallon: 26.3, 28.7, 27.4, 26.6, 27.4, 26.9. Find the mode for these
data.
Here, the mileage 27.4 appears twice, which is more than any other value. So, the mode for this
set of gas mileage tests is 27.4 miles per gallon.
MEDIAN
The median reflects the middle value when observations are ordered from least to most. The
median splits a set of ordered observations into two equal parts, the upper and lower halves.
FINDING THE MEDIAN
12. Find the median for the following retirement ages: 60, 63, 45, 63, 65, 70, 55, 63, 60, 65,
63.
Arrange the retirement ages in ascending order:
45, 55, 60, 60, 63, 63, 63, 63, 65, 65, 70.
Since there are 11 data points, the median will be the middle value. In this case, the middle value
is the sixth value, which is 63.
So, the median retirement age for this set of data is 63.
13. Find the median for the following gas mileage tests: 26.3, 28.7, 27.4, 26.6, 27.4, 26.9.
let's arrange the values in ascending order:
26.3, 26.6, 26.9, 27.4, 27.4, 28.7
Since there are 6 data points, the median will be the average of the two middle values (if there's
an even number of data points). Here, the two middle values are 26.9 and 27.4.
Calculating the average:
Median = (26.9 + 27.4) / 2
Median = 54.3 / 2
Median = 27.15
So, the median for this set of gas mileage tests is 27.15 miles per gallon.
MEAN
The mean is the most common average, one you have doubtless calculated many times. The
mean is found by adding all scores and then dividing by the number of scores.

Statisticians distinguish between two types of means—the population mean and the sample
mean—depending on whether the data are viewed as a population (a complete set of scores) or as
a sample (a subset of scores).

Business Report: Statistical Methods For Decision Making Project PGP-DSBA Online Athisya Nadar 9 May 2021
89% (9)
Business Report: Statistical Methods For Decision Making Project PGP-DSBA Online Athisya Nadar 9 May 2021
26 pages
Unit - Ii Describing Data I
No ratings yet
Unit - Ii Describing Data I
21 pages
Course Code & Number:FET201
No ratings yet
Course Code & Number:FET201
70 pages
Data Organization
No ratings yet
Data Organization
69 pages
Data Visualization
No ratings yet
Data Visualization
5 pages
Eng 2015 Prelims Reviewer
No ratings yet
Eng 2015 Prelims Reviewer
11 pages
Frequency Distributions: Essentials of Statistics For The Behavioral Sciences
No ratings yet
Frequency Distributions: Essentials of Statistics For The Behavioral Sciences
45 pages
‎⁨المحاضرة الثانية 162 - 1446⁩
No ratings yet
‎⁨المحاضرة الثانية 162 - 1446⁩
39 pages
Describing Data With Tables
No ratings yet
Describing Data With Tables
9 pages
FDSA Unit-2
No ratings yet
FDSA Unit-2
41 pages
1st Mid
No ratings yet
1st Mid
19 pages
CHAPTER 1 - PART 1 Latest PDF
No ratings yet
CHAPTER 1 - PART 1 Latest PDF
69 pages
Chapter 1 Eqt 271 (Part 1) : Basic Statistics
No ratings yet
Chapter 1 Eqt 271 (Part 1) : Basic Statistics
69 pages
Final Term Notes Ands
No ratings yet
Final Term Notes Ands
43 pages
Fdsa PPT - Unit 2
No ratings yet
Fdsa PPT - Unit 2
73 pages
Unit 2 - Descriptive Analytics
No ratings yet
Unit 2 - Descriptive Analytics
85 pages
Chapter 4 Data Management
No ratings yet
Chapter 4 Data Management
29 pages
Statistic CH 1 30-Jan-2025 08-57-44
No ratings yet
Statistic CH 1 30-Jan-2025 08-57-44
14 pages
Describing Data New
No ratings yet
Describing Data New
13 pages
Frequency Distribution & Graghs
No ratings yet
Frequency Distribution & Graghs
28 pages
QM Statistic Notes
No ratings yet
QM Statistic Notes
24 pages
Unit 4 Quantitative Analysis and Interpretation
No ratings yet
Unit 4 Quantitative Analysis and Interpretation
10 pages
Frequency Distribution
100% (2)
Frequency Distribution
25 pages
Staticus: Math 103 Lecture 9 Class Notes
No ratings yet
Staticus: Math 103 Lecture 9 Class Notes
4 pages
Basic Statistical Concepts - Measures of Location
No ratings yet
Basic Statistical Concepts - Measures of Location
14 pages
Ge3 Mdterm
No ratings yet
Ge3 Mdterm
79 pages
1 Statistics 23
No ratings yet
1 Statistics 23
98 pages
MMW Module 4 - Statistics
No ratings yet
MMW Module 4 - Statistics
18 pages
Data Management
No ratings yet
Data Management
24 pages
Lecture 2 - Table and Chart
No ratings yet
Lecture 2 - Table and Chart
9 pages
Ad3491 Fdsa Unit 2 Notes Eduengg
No ratings yet
Ad3491 Fdsa Unit 2 Notes Eduengg
85 pages
Statistical Analysis With Software Application - Week2
No ratings yet
Statistical Analysis With Software Application - Week2
76 pages
Chapter1 (L1) Updated
No ratings yet
Chapter1 (L1) Updated
59 pages
Statistics For Css
No ratings yet
Statistics For Css
73 pages
Probability+&+Statistics Formulas
No ratings yet
Probability+&+Statistics Formulas
47 pages
Wa0009.
No ratings yet
Wa0009.
141 pages
BIOL 2163 Lecture 2 - Summarizing and Graphing Data
No ratings yet
BIOL 2163 Lecture 2 - Summarizing and Graphing Data
59 pages
Lecture-2,3 - Chapter 2 - Organizing and Graphing Data
No ratings yet
Lecture-2,3 - Chapter 2 - Organizing and Graphing Data
46 pages
Chap 1 - 2: Business Statistics
No ratings yet
Chap 1 - 2: Business Statistics
38 pages
M 301 - Ch1 - Introduction To Statistics
No ratings yet
M 301 - Ch1 - Introduction To Statistics
96 pages
"Probability and Statistics (For Engineering) 235 M: Summer Session 2019/2020
No ratings yet
"Probability and Statistics (For Engineering) 235 M: Summer Session 2019/2020
45 pages
1 Review of Statistics
No ratings yet
1 Review of Statistics
24 pages
Business Statistics II
No ratings yet
Business Statistics II
32 pages
Chapter
No ratings yet
Chapter
33 pages
7.1 - Describibing Data & Sample Inforntation - Lecture 1
No ratings yet
7.1 - Describibing Data & Sample Inforntation - Lecture 1
37 pages
Chapter 2 Sta
No ratings yet
Chapter 2 Sta
83 pages
Lecture 2 Statistics
No ratings yet
Lecture 2 Statistics
38 pages
2. presenting of data - ١١١٠٥٩
No ratings yet
2. presenting of data - ١١١٠٥٩
39 pages
Chapter 2 Describing Data Using Tables and Graphs
No ratings yet
Chapter 2 Describing Data Using Tables and Graphs
16 pages
Ix. Introduction To Statistical Concepts: Frequency Distribution Measures of Central Tendency Measures of Variability
No ratings yet
Ix. Introduction To Statistical Concepts: Frequency Distribution Measures of Central Tendency Measures of Variability
119 pages
Frequency Disrtributions and Graphical Representations
No ratings yet
Frequency Disrtributions and Graphical Representations
59 pages
Stat 2
No ratings yet
Stat 2
39 pages
Stats AP Review
100% (2)
Stats AP Review
38 pages
Descriptive Lec
No ratings yet
Descriptive Lec
7 pages
Intro To Statistics
No ratings yet
Intro To Statistics
38 pages
Data Presentation
No ratings yet
Data Presentation
16 pages
Chap002, 13e
No ratings yet
Chap002, 13e
35 pages
CAS - Descriptive Statistics - Final PPT-1
No ratings yet
CAS - Descriptive Statistics - Final PPT-1
112 pages
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
No ratings yet
Chapter 2: Descriptive Analysis and Presentation of Single-Variable Data
71 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Business Statistics I Essentials
From Everand
Business Statistics I Essentials
Louise Clark
5/5 (5)
Jaggia BA 2e Chap003 PPT
No ratings yet
Jaggia BA 2e Chap003 PPT
42 pages
Probability One Mark
No ratings yet
Probability One Mark
8 pages
PT1,2,3 - Random Variables, Distribution Functions, Mathematical Expectation - CG-1-39
No ratings yet
PT1,2,3 - Random Variables, Distribution Functions, Mathematical Expectation - CG-1-39
39 pages
2020 Pgt202e Measurement Scale
No ratings yet
2020 Pgt202e Measurement Scale
42 pages
Queueing Model: A Brief Introduction: Inputs
No ratings yet
Queueing Model: A Brief Introduction: Inputs
13 pages
Mean Profile Diagnostic.
No ratings yet
Mean Profile Diagnostic.
78 pages
Week 7 and 8
No ratings yet
Week 7 and 8
32 pages
Assessment and Evaluation of Learning Part 1
100% (1)
Assessment and Evaluation of Learning Part 1
138 pages
OLAH DATA - Humam Juzaili Afif - J3E116060
No ratings yet
OLAH DATA - Humam Juzaili Afif - J3E116060
13 pages
Stat and Prob DLL
100% (2)
Stat and Prob DLL
15 pages
Use of Statistical Tools: Hapter
No ratings yet
Use of Statistical Tools: Hapter
15 pages
'SST 111 Introduction To Probability and Statistics Lecture Notes
No ratings yet
'SST 111 Introduction To Probability and Statistics Lecture Notes
58 pages
3rd and 4th Semester Syllabus-1
No ratings yet
3rd and 4th Semester Syllabus-1
17 pages
Sampling and Sampling Distributions
No ratings yet
Sampling and Sampling Distributions
29 pages
Chapter3-1 1
No ratings yet
Chapter3-1 1
21 pages
Assignment 2 Solving
No ratings yet
Assignment 2 Solving
37 pages
June 2014 QP - S1 Edexcel
No ratings yet
June 2014 QP - S1 Edexcel
16 pages
Normal Distribution Revision With Answers
No ratings yet
Normal Distribution Revision With Answers
8 pages
Unit 1 Introduction To Sampling Distribution: Structure
No ratings yet
Unit 1 Introduction To Sampling Distribution: Structure
22 pages
The Measurement and Control of Dilution in An Underground Coal Operation
No ratings yet
The Measurement and Control of Dilution in An Underground Coal Operation
7 pages
Bean Experiment
No ratings yet
Bean Experiment
29 pages
The Normal Distribution Questions
No ratings yet
The Normal Distribution Questions
15 pages
SHS - stat&Prob.Q3.W1 5.52pgs
No ratings yet
SHS - stat&Prob.Q3.W1 5.52pgs
52 pages
Edexcel GCSE Maths Higher Practice Book - Answers
No ratings yet
Edexcel GCSE Maths Higher Practice Book - Answers
49 pages
Stat Prob Q3 Module 2
100% (2)
Stat Prob Q3 Module 2
22 pages
Hypothesis Testing
0% (1)
Hypothesis Testing
139 pages
OMAC Data Analyst
No ratings yet
OMAC Data Analyst
91 pages
Chapter: Probability Distributions Contents
No ratings yet
Chapter: Probability Distributions Contents
71 pages
Unit Lesson Plan For Imp 1 The Pit and The Pendulum
No ratings yet
Unit Lesson Plan For Imp 1 The Pit and The Pendulum
9 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Unit 2

Uploaded by

Unit 2

Uploaded by

UNIT – II DESCRIPTIVE ANALYTICS USING STATISTICS 9

Frequency distributions – Outliers –Interpreting Distributions – Graphs – Averages –Describing

Not Always Appropriate

Real Limits of Class Intervals

2. The IQ scores for a group of 35 high school dropouts are as follows

(a) Construct a frequency distribution for grouped data.

Given the IQ scores:

=54/10=5.4 Round off to a convenient number, such as 5.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.