Ba Lecture 2
Ba Lecture 2
Business Analytics
Instructor : Daniyal Nawaz
1
Business Analytics
Lecture # 02
Descriptive Statistics
2
Overview of Using Data: Definitions and Goals
• Data
• Variable
• Observation
• Variation
• Random variables
3
• Data are the facts and figures collected,
analyzed, and summarized for presentation
and interpretation.
• A characteristic or a quantity of interest that
can take on different values is known as a
variable.
• An observation is a set of values
corresponding to a set of variables
4
variation is the difference in a variable measured over observations
(time, customers, items, etc.).
The values of some variables are under direct control of the decision
maker (these are often called decision variables).
5
6
Types of data
• Population and Sample Data
• Quantitative and Categorical Data
• Cross-Sectional and Time Series Data
7
Data can be categorized in several ways based
on how they are collected and the type
collected.
• In many cases, it is not feasible to collect data
from the population of all elements of
interest. In such instances, we collect data
from a subset of the population known as a
sample.
8
What is a Statistic????
Sample
Sample
Sample
Population
Sample
9
Sample vs. Population
Population Sample
11
Quantitative and Categorical Data
12
• If arithmetic operations cannot be performed
on the data, they are considered categorical
data
• For instance, the data in the Industry column
in Table 2.1 are categorical
13
Cross-Sectional and Time Series Data
• Cross-sectional data are collected from several entities at
the same, or approximately the same, point in time.
14
15
Some Definitions
3. Interval 4. Ratio
17
Categorical (Nominal) data
• What does this mean? No mathematical
operations can be performed on the data
relative to each other.
• Therefore, nominal data reflect qualitative
differences rather than quantitative ones.
• Nominal measurements only permit you to
determine whether two individuals are the
same or different.
18
Nominal data
Examples:
Male Yes
Female No
19
Ordinal data
• Ordinal data is data that comprises of categories that can be rank
ordered.
• Similarly with nominal data the distance between each category cannot
be calculated but the categories can be ranked above or below each
other.
• No fixed units of measurement
• Examples:
• - college football rankings
• - survey responses
• (poor, average, good, very good, excellent)
• What does this mean? Can make statistical judgements and perform
limited maths.
20
Ordinal data
21
Interval and ratio data
22
Interval data
• Ordinal data but with constant differences
between observations
• Examples:
• Time – moves along a continuous measure or
seconds, minutes and so on and is without a zero
point of time.
• Temperature – moves along a continuous
measure of degrees and is without a true zero.
• SAT scores
23
Ratios
• Ratio data measured on a continuous scale and does have a
natural zero point
• Ratios are meaningful
• Examples:
• Monthly sales
• Delivery times
• Weight
• Height
• Age
24
Data for Business Analytics
Classifying Data Elements in a Purchasing Database
Figure 1.2
Data for Business Analytics
(continued)
Classifying Data Elements in a Purchasing Database
Modifying Data in Excel
Sorting Data in Excel
• Step 1. Select cells A1:F21
• Step 2. Click the Data tab in the Ribbon
• Step 3. Click Sort in the Sort & Filter group
• Step 4. Select the check box for My data has headers
• Step 5. In the first Sort by dropdown menu, select
Sales (March 2010)
• Step 6. In the Order dropdown menu, select Largest
to Smallest (see Figure 2.4)
• Step 7. Click OK
27
• Ref book pg 24
28
Filtering
29
Creating Distributions from Data
• Distributions help summarize many
characteristics of a data set by describing how
often certain values for a variable appear in
that data set.
• Distributions can be created for both
categorical and quantitative data, and they
assist the analyst in gauging variation.
30
Frequency Distributions for Categorical Data
31
Frequency Distribution
Consider a data set of 26 children of ages 1-6 years. Then the frequency
distribution of variable ‘age’ can be tabulated as follows:
Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Grouped Frequency Distribution of Age:
Age Group 1-2 3-4 5-6
Frequency 8 12 6
32
Example: 1
34
35
Solution ?
• Discussed in class
36
Relative Frequency and Percent Frequency
Distributions
• A relative frequency distribution is a tabular
summary of data showing the relative
frequency for each bin.
37
Relative Frequency and Percent Frequency
Distributions
38
Example 3
39
Frequency Distributions for Quantitative Data
40
• These data show the time in days required to
complete year-end audits for a sample of 20
clients of Sanderson and Clifford, a small public
accounting firm. The three steps necessary to
define the classes for a frequency distribution
with quantitative data are as follows:
41
• Number of Bins: Bins are formed by specifying
the ranges used to group the data.
• Width of the Bins: choose a width for the
bins.
•
42
• Bin Limits: Bin limits must be chosen so that
each data item belongs to one and only one
class.
43
Example 4
44
45
• Step 1. Select cells B10:B14
• Step 2. Type the formula 5FREQUENCY(A2:D6,
A10:A14). The range A2:D6
• defines the data set, and the range A10:A14
defines the bins.
• Step 3. Press CTRL+SHIFT1+ENTER after typing
the formula in Step 2.
46
47
Data Presentation
Graphical Presentation: We look for the overall pattern and for striking deviations
from that pattern. Over all pattern usually described by shape, center, and spread
of the data. An individual value that falls outside the overall pattern is called an
outlier.
Bar diagram and Pie charts are used for categorical variables.
Histogram, stem and leaf and Box-plot are used for numerical variable.
Histograms
• Step 1. Click the Data tab in the Ribbon
• Step 2. Click Data Analysis in the Analyze group
• Step 3. When the Data Analysis dialog box opens,
choose Histogram from the list of
• Analysis Tools, and click OK
• In the Input Range: box, enter A2:D6
• In the Bin Range: box, enter A10:A14
• Under Output Options:, select New Worksheet Ply:
• Select the check box for Chart Output (see Figure 2.13)
• Click OK
49
A common graphical presentation of
quantitative data is a histogram
50
Data Presentation –Categorical
Variable
Bar Diagram: Lists the categories and presents the percent or count of individuals who fall
in each category.
1 15 (15/60)=0.25 25.0
30
Number of Subjects
25 2 25 (25/60)=0.333 41.7
20
15
3 20 (20/60)=0.417 33.3
10 Total 60 1.00 100
5
0
1 2 3
Treatm ent Group
Data Presentation –Categorical
Variable
Pie Chart: Lists the categories and presents the percent or count of individuals who fall in
each category.
1 15 (15/60)=0.25 25.0
25% 2 25 (25/60)=0.333 41.7
33% 1
2 3 20 (20/60)=0.417 33.3
Histogram: Overall pattern can be described by its shape, center, and spread. The
following age distribution is right skewed. The center lies between 80 to 100. No
outliers.
Mean 90.41666667
Figure 3: Age Distribution
Standard Error 3.902649518
16 Median 84
14 Mode 84
Number of Subjects
2 Range 95
0 Minimum 48
40 60 80 100 120 140 More
Maximum 143
Age in Month
Sum 5425
Count 60
Thank You !