0% found this document useful (0 votes)
35 views54 pages

Ba Lecture 2

Business analytics involves collecting, analyzing, and summarizing data to understand variation and its impact on business. Descriptive statistics describe key attributes of data like the mean, median, and standard deviation. Data can be classified as categorical, quantitative, cross-sectional, or time-series. Frequency distributions summarize how often certain values appear in a data set.

Uploaded by

Miss, Husna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views54 pages

Ba Lecture 2

Business analytics involves collecting, analyzing, and summarizing data to understand variation and its impact on business. Descriptive statistics describe key attributes of data like the mean, median, and standard deviation. Data can be classified as categorical, quantitative, cross-sectional, or time-series. Frequency distributions summarize how often certain values appear in a data set.

Uploaded by

Miss, Husna
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 54

Business Analytics

Business Analytics
Instructor : Daniyal Nawaz

1
Business Analytics

Lecture # 02

Descriptive Statistics

2
Overview of Using Data: Definitions and Goals

• Data
• Variable
• Observation
• Variation
• Random variables

3
• Data are the facts and figures collected,
analyzed, and summarized for presentation
and interpretation.
• A characteristic or a quantity of interest that
can take on different values is known as a
variable.
• An observation is a set of values
corresponding to a set of variables

4
variation is the difference in a variable measured over observations
(time, customers, items, etc.).

The role of descriptive analytics is to collect and analyse data to gain a


better understanding of variation and its impact on the business
setting.

The values of some variables are under direct control of the decision
maker (these are often called decision variables).

The values of other variables may fluctuate with uncertainty because


of factors outside the direct control of the decision maker. In general,
a quantity whose values are not known with certainty is called a
random variable, or uncertain variable.

5
6
Types of data
• Population and Sample Data
• Quantitative and Categorical Data
• Cross-Sectional and Time Series Data

7
Data can be categorized in several ways based
on how they are collected and the type
collected.
• In many cases, it is not feasible to collect data
from the population of all elements of
interest. In such instances, we collect data
from a subset of the population known as a
sample.

8
What is a Statistic????

Sample
Sample
Sample

Population
Sample

Parameter: value that describes a population

Statistic: a value that describes a sample

9
Sample vs. Population

Population Sample
11
Quantitative and Categorical Data

Data are considered quantitative data if


numeric and arithmetic operations included ,
such as
• addition,
• subtraction,
• multiplication,
• and division.
For instance, we can sum the values for Volume in the
Dow data in Table 2.1 to calculate a total volume of all
shares traded by companies included in the Dow.

12
• If arithmetic operations cannot be performed
on the data, they are considered categorical
data
• For instance, the data in the Industry column
in Table 2.1 are categorical

13
Cross-Sectional and Time Series Data
• Cross-sectional data are collected from several entities at
the same, or approximately the same, point in time.

The data in Table 2.1 are cross-sectional because they


describe the 30 companies that comprise the Dow at the
same point in time (July 2015).

• Time series data are collected over several time periods.

Graphs of time series data are frequently found in business


and economic publications.

14
15
Some Definitions

Variable - any characteristic of an individual or entity. A variable can take different


values for different individuals. Variables can be categorical or quantitative. Per S. S.
Stevens…
• Nominal - Categorical variables with no inherent order or ranking sequence such
as names or classes (e.g., gender). Value may be a numerical, but without numerical
value (e.g., I, II, III). The only operation that can be applied to Nominal variables is
enumeration.
• Ordinal - Variables with an inherent rank or order, e.g. mild, moderate, severe. Can
be compared for equality, or greater or less, but not how much greater or less.
• Interval - Values of the variable are ordered as in Ordinal, and additionally,
differences between values are meaningful, however, the scale is not absolutely
anchored. Calendar dates and temperatures on the Fahrenheit scale are
examples. Addition and subtraction, but not multiplication and division are
meaningful operations.
• Ratio - Variables with all properties of Interval plus an absolute, non-arbitrary zero
point, e.g. age, weight, temperature (Kelvin). Addition, subtraction, multiplication,
and division are all meaningful operations.
Types of measurement
• When collecting or gathering data we collect data from individuals
cases on particular variables.
• A variable is a unit of data collection whose value can vary.
• Variables can be defined into types according to the level of
mathematical scaling that can be carried out on the data.
• There are four types of data or levels of measurement:

1. Categorical (Nominal) 2. Ordinal

3. Interval 4. Ratio

17
Categorical (Nominal) data
• What does this mean? No mathematical
operations can be performed on the data
relative to each other.
• Therefore, nominal data reflect qualitative
differences rather than quantitative ones.
• Nominal measurements only permit you to
determine whether two individuals are the
same or different.
18
Nominal data
Examples:

What is your gender? (please Did you enjoy the film?


tick) (please tick)

Male Yes
Female No

19
Ordinal data
• Ordinal data is data that comprises of categories that can be rank
ordered.
• Similarly with nominal data the distance between each category cannot
be calculated but the categories can be ranked above or below each
other.
• No fixed units of measurement
• Examples:
• - college football rankings
• - survey responses
• (poor, average, good, very good, excellent)
• What does this mean? Can make statistical judgements and perform
limited maths.

20
Ordinal data

21
Interval and ratio data

• Both interval and ratio data are examples of scale data.


• Scale data:
• data is in numeric format ($50, $100, $150)
• data that can be measured on a continuous scale
• the distance between each can be observed and as a result
measured
• the data can be placed in rank order.

22
Interval data
• Ordinal data but with constant differences
between observations
• Examples:
• Time – moves along a continuous measure or
seconds, minutes and so on and is without a zero
point of time.
• Temperature – moves along a continuous
measure of degrees and is without a true zero.
• SAT scores

23
Ratios
• Ratio data measured on a continuous scale and does have a
natural zero point
• Ratios are meaningful
• Examples:
• Monthly sales
• Delivery times
• Weight
• Height
• Age

24
Data for Business Analytics
Classifying Data Elements in a Purchasing Database

Figure 1.2
Data for Business Analytics
(continued)
Classifying Data Elements in a Purchasing Database
Modifying Data in Excel
Sorting Data in Excel
• Step 1. Select cells A1:F21
• Step 2. Click the Data tab in the Ribbon
• Step 3. Click Sort in the Sort & Filter group
• Step 4. Select the check box for My data has headers
• Step 5. In the first Sort by dropdown menu, select
Sales (March 2010)
• Step 6. In the Order dropdown menu, select Largest
to Smallest (see Figure 2.4)
• Step 7. Click OK

27
• Ref book pg 24

28
Filtering

• Step 1. Select cells A1:F21


• Step 2. Click the Data tab in the Ribbon
• Step 3. Click Filter in the Sort & Filter group
• Step 4. Click on the Filter Arrow in column B, next
to Manufacturer
• Step 5. If all choices are checked, you can easily
deselect all choices by unchecking
• (Select All). Then select only the check box for
Toyota.
• Step 6. Click OK

29
Creating Distributions from Data
• Distributions help summarize many
characteristics of a data set by describing how
often certain values for a variable appear in
that data set.
• Distributions can be created for both
categorical and quantitative data, and they
assist the analyst in gauging variation.

30
Frequency Distributions for Categorical Data

• A frequency distribution is a summary of data


that shows the number (frequency) of
observations in each of several non
overlapping classes, typically referred to as
bins.

31
Frequency Distribution

Consider a data set of 26 children of ages 1-6 years. Then the frequency
distribution of variable ‘age’ can be tabulated as follows:

Frequency Distribution of Age

Age 1 2 3 4 5 6
Frequency 5 3 7 5 4 2
Grouped Frequency Distribution of Age:
Age Group 1-2 3-4 5-6

Frequency 8 12 6

32
Example: 1

A survey was taken in Maple Avenue. In each of 20 homes, people were


asked how many cars were registered to their households. The results
were recorded as follows:
3, 1, 4, 0, 2, 1, 5, 2, 1, 5, 4, 2, 3, 2, 0, 2, 1, 0, 3, 2.
Present this data in Frequency Distribution Table.
Also find maximum number of cars registered by household.
Example# 02

34
35
Solution ?

• Discussed in class

36
Relative Frequency and Percent Frequency
Distributions
• A relative frequency distribution is a tabular
summary of data showing the relative
frequency for each bin.

• A percent frequency distribution summarizes


the percent frequency of the data for each
bin.

37
Relative Frequency and Percent Frequency
Distributions

for Coca-Cola is 19/50 = 0.38,


for Diet- Coke is 8/50 = 0.16, and so on.

38
Example 3

39
Frequency Distributions for Quantitative Data

• Consider the quantitative data in Table 2.6

40
• These data show the time in days required to
complete year-end audits for a sample of 20
clients of Sanderson and Clifford, a small public
accounting firm. The three steps necessary to
define the classes for a frequency distribution
with quantitative data are as follows:

1. Determine the number of non overlapping bins.


2. Determine the width of each bin.
3. Determine the bin limits.

41
• Number of Bins: Bins are formed by specifying
the ranges used to group the data.
• Width of the Bins: choose a width for the
bins.

bin width of (33 -12)/5 = 4.2 Approx. is 5

42
• Bin Limits: Bin limits must be chosen so that
each data item belongs to one and only one
class.

lower and upper bin limits to obtain a total of five classes:


10–14,
15–19,
20–24,
25–29,
30–34.

43
Example 4

44
45
• Step 1. Select cells B10:B14
• Step 2. Type the formula 5FREQUENCY(A2:D6,
A10:A14). The range A2:D6
• defines the data set, and the range A10:A14
defines the bins.
• Step 3. Press CTRL+SHIFT1+ENTER after typing
the formula in Step 2.

46
47
Data Presentation

Two types of statistical presentation of data - graphical and numerical.

Graphical Presentation: We look for the overall pattern and for striking deviations
from that pattern. Over all pattern usually described by shape, center, and spread
of the data. An individual value that falls outside the overall pattern is called an
outlier.

Bar diagram and Pie charts are used for categorical variables.
Histogram, stem and leaf and Box-plot are used for numerical variable.
Histograms
• Step 1. Click the Data tab in the Ribbon
• Step 2. Click Data Analysis in the Analyze group
• Step 3. When the Data Analysis dialog box opens,
choose Histogram from the list of
• Analysis Tools, and click OK
• In the Input Range: box, enter A2:D6
• In the Bin Range: box, enter A10:A14
• Under Output Options:, select New Worksheet Ply:
• Select the check box for Chart Output (see Figure 2.13)
• Click OK
49
A common graphical presentation of
quantitative data is a histogram

50
Data Presentation –Categorical
Variable
Bar Diagram: Lists the categories and presents the percent or count of individuals who fall
in each category.

Figure 1: Bar Chart of Subjects in Treatment Frequency Proportion Percent


Treatm ent Groups Group (%)

1 15 (15/60)=0.25 25.0
30
Number of Subjects

25 2 25 (25/60)=0.333 41.7
20
15
3 20 (20/60)=0.417 33.3
10 Total 60 1.00 100
5
0
1 2 3
Treatm ent Group
Data Presentation –Categorical
Variable
Pie Chart: Lists the categories and presents the percent or count of individuals who fall in
each category.

Figure 2: Pie Chart of Treatment Frequency Proportion Percent


Subjects in Treatment Groups Group (%)

1 15 (15/60)=0.25 25.0
25% 2 25 (25/60)=0.333 41.7
33% 1
2 3 20 (20/60)=0.417 33.3

3 Total 60 1.00 100


42%
Graphical Presentation –Numerical Variable

Histogram: Overall pattern can be described by its shape, center, and spread. The
following age distribution is right skewed. The center lies between 80 to 100. No
outliers.

Mean 90.41666667
Figure 3: Age Distribution
Standard Error 3.902649518

16 Median 84
14 Mode 84
Number of Subjects

12 Standard Deviation 30.22979318


10
Sample Variance 913.8403955
8
Kurtosis -1.183899591
6
4 Skewness 0.389872725

2 Range 95
0 Minimum 48
40 60 80 100 120 140 More
Maximum 143
Age in Month
Sum 5425
Count 60
Thank You !

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy