0% found this document useful (0 votes)
92 views50 pages

Introduction To Analytics

The document provides an overview of key concepts in descriptive statistics including measures of central tendency (mean, median, mode), measures of dispersion (variance, standard deviation, interquartile range, outliers), and probability. It defines these concepts, provides examples to illustrate how to calculate each measure, and discusses how these statistical techniques can be used to analyze business data and make inferences about populations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
92 views50 pages

Introduction To Analytics

The document provides an overview of key concepts in descriptive statistics including measures of central tendency (mean, median, mode), measures of dispersion (variance, standard deviation, interquartile range, outliers), and probability. It defines these concepts, provides examples to illustrate how to calculate each measure, and discusses how these statistical techniques can be used to analyze business data and make inferences about populations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

IMT-PG Programme in Management

Decision Sciences

Introduction to analytics

Presented by: Dr. Anuja Shukla


Agenda

Topics Time Allotted


Business problem and context setting 5 Mins
Measures of central tendencies (Mean, Median & Mode) 15 Mins
Measures of dispersion (Variance, SD, Outliers and IQR) 15 Mins
Probability 40 mins
Q&A 15 Mins
Why Business Analytics?

Data
Information
Knowledge
Wisdom
Analysing Data
Statistics
• The practice or science of collecting and analysing numerical data in large quantities,
especially for the purpose of inferring proportions in a whole from those in a representative
sample.

Statistics

Descriptive Statistics Inferential Statistics


Presenting, organising and summarizing data Drawing conclusions about a population based
Understanding data on data observed in a sample

Graphs & Charts Hypothesis testing

Measure of Central tendency


Mean, Median, Mode

Measure of Dispersion
Standard deviation, IQR, Variance
Mean

• Mean is the average value.


• It’s the central value of data set.
Median

• Positional average
• Suitable in case of outlier
• median is the value separating the higher half from the lower half of a data
sample

• Steps
• Arrange the sample data in ascending order of frequency, from left to right,
the value in the middle is called the median.
• For an odd number of values, we have one central value.
• For an even number of values, the median is the average of the two central
values.
Median

Data:
a. N is odd
4,7,9,10,12
Median= 3rd observation =9

b. N is even
4,7,9,10
Median=Average of 2nd and 3rd observation =(7+9)/2=8
Mode

• The mode is the number that appears


Example
most frequently in a data set.
• A set of numbers may have one mode, Data: 4 , 1, 5, 3, 0, 2
more than one mode, or no mode at Mode: No mode
all.
• For qualitative data, it is not possible Data: 4 , 2, 4, 3, 2, 2,1,5
to measure the mean or median Mode: 4 and 2 - Bimodal
values, as there are no numerical
values. Data: 4 , 2, 1,5, 4, 3, 2, 3
Mode: 2,3,4 – Multi modal
• Thus, the variable with the highest
frequency is considered as the
measure of central tendency in such
cases.
Question
Q: Following number of customers buy icecream at Amul ice cream milk
parlour (Mon-Friday). What is the average number of chocolate and
strawberry icream demanded in each week?
Day Monday Tuesday Wednesday Thursday Friday
Chocolate 4 7 9 10 12
Icecream
Strawberry 2 1 4 1 2

Mean (Chocolate)
= sum/n
=(4+7+9+10+12)/5= 42/5= 8.4

Mean (Strawberry)= 10/5=2


Question?
• Following are the number of customers visiting Spencer on a week.
What is the average number of customer visiting in a day?

• 120,112,174, 134,126,121,344
• Mean= (120+112+174+134+126+121+344)=1131/7=161.57
• Median
• Arrange the sample data in ascending order of frequency
• 112,120,121,126,134, 174,344
• Find mid value

Note- Don’t forget to arrange data in ascending order!


Question
• Peter England records the number of shirts sold at its store. Which shirt is
most liked by consumers?

Item SKU001 SKU002 SKU003 SKU004 SKU005 SKU006 SKU007


No of 10 12 13 21 10 11 12
shirts
sold

SKU004
Customer feedback at Decathlon
Standard Deviation
• Shows variation about the mean
• Most commonly used measure of variation
• It serves the purpose of measuring variation without exaggerating its
magnitude.
• It is popularly represented as 𝜎.
Variance
• Variance is defined as the mean of the square of the difference between data
points and the mean value of all data points within a dataset.
• Variance is a measure of variability that utilizes all the data.
• Variance is square of standard deviation
Interquartile range

• Suitable in case of outlier


• The interquartile spread is a much better way to communicate the
variation or spread in the data in case of outlier.
• Quartile values are the values in a sample at the 25th, 50th, 75th, and
100th percentiles.

Quartile Excel Function


25th percentile or First Quartile QUARTILE (A1: A20, 1)
50th percentile or Second Quartile QUARTILE (A1: A20, 2)
75th percentile or Third Quartile QUARTILE (A1: A20, 3)
100th percentile or Fourth Quartile QUARTILE (A1: A20, 4)
Quartiles

Interquartile range = Q3- Q1


=77-64
=13
Outliers
Probability , Sampling and Estimation
Probability
• Probability quantifies the likelihood or belief that an event will occur.
• Probability is the branch of mathematics concerning numerical
descriptions of how likely an event is to occur or how likely it is that a
proposition is true. The probability of an event is a number between 0
and 1, where, roughly speaking, 0 indicates impossibility of the event
and 1 indicates certainty.
• p= probability of occurrence of an event
• q=probability of failure of an event (1-p)

p+q=1
Types of Probability

• There are three basic ways of classifying probability. These three


represent rather different conceptual approaches to the study of
probability theory.

Probability

Classical or Empirical or
Subjective
Theoretical Frequentist
Approach
Approach Approach
Classical or Theoretical Approach

• Defines probability as ratio of favorable outcomes to the total outcomes.


• Also known as priori probability.
• It assumes a number of assumptions, hence is the most restrictive approach, and it
is least useful in real-life situations.
• The classical/theoretical approach is the one where certain assumptions are made
(for instance, all possible outcomes are equally likely), and the probability values are
then calculated using a formula.
• One does not need to perform any experiment or gather any data when using this
approach to arrive at the probability of an event.
Calculation of Probability: Fair Coin

• Calculate the probability of getting heads on


one toss of a fair coin.
• Assume- All the possible outcomes are
equally likely to occur.
• Total outcomes = {Head , Tail}
• No. of total outcomes = 2
• No. Favorable outcomes= {Head} = 1
Empirical or Frequentist Approach

• Defines probability as observed relative frequency of an event in a very large


number of trials.
• It assumes less assumptions but requires the event to be capable of being repeated
a large number of times.
• Probability gains accuracy as we increase the number of observations.
• Probabilities are derived from observations.
• For probability of getting heads on tossing a coin, you would toss the coin several
times and note down the outcome of each trial.
• Let’s say you tossed the coin 10,000 times and got heads 5,052 times and tails 4,948 times.

• P(H)=5052/10,000
• Following the frequentist approach, you would conclude that the probability of getting heads is
0.5052 and that of getting tails is 0.4948 for that particular coin.
Example
• Suppose an insurance company knows from past actuarial data that of all males 40
years old, about 60 out of every 100,000 will die within a 1 –year period. Using this
method, the estimate the probability of death for that age group.

p=60/100000 = 0.00006
Subjective Probability

• Deals with specific or unique situations typical of the business or management


world.
• Based upon some belief or educated guess of the decision maker.
• Subjective assessments of probability permits the widest flexibility of the three
concepts, also known as personal probability.

• Example: Judge punishing a criminal, HR selecting candidate for job


Statistics
• Statistics provides tools that let you make inferences about data.

• Types: Descriptive and Inferential

• Tools for making statistical inferences are


1) built on top of probability theory, and
2) require an understanding of how samples behave when you take them from
distributions (defined by probability theory…).
Probability vs Statistics

• Probability theory is “the doctrine of chances”.


• It’s a branch of mathematics that tells you how often different kinds of events
will happen.
• For example, all of these questions are things you can answer using probability
theory:
• What are the chances of a fair coin coming up heads 10 times in a row?
• If I roll two six sided dice, how likely is it that I’ll roll two sixes?
• How likely is it that five cards drawn from a perfectly shuffled deck will all be
hearts?
• What are the chances that I’ll win the lottery?
Probability vs Statistics
• In statistics, we know the truth about the world.
• All we have is the data, and it is from the data that we want
to learn the truth about the world.
• Statistical questions tend to look more like these:
• If my friend flips a coin 10 times and gets 10 heads, are they playing a trick on
me?
• If five cards off the top of the deck are all hearts, how likely is it that the deck
was shuffled?
• If the lottery commissioner’s spouse wins the lottery, how likely is it that the
lottery was rigged?
Random Variable

• A Random Variable is a set of possible values from a random experiment.


• A Random Variable has a whole set of values ...... and it could take on any of those
values, randomly.
• A random variable, usually written X, is a variable whose possible values are
numerical outcomes of a random phenomenon

Types of Random Variables

Discrete Continuous
(Binomial Distribution) (Normal Distribution)
Random Variable

• Discrete random variable- A discrete random variable is one which may take on only a
countable number of distinct values
• Examples: No of customer, roll of die, no of students in class, number of children in a family,
Number of people watching movie in a theatre, the number of patients in a doctor's
surgery, the number of defective light bulbs in a box of ten.
• For example, the number of students in a class. A class can have 10 students or 11 students,
but it cannot have 10.25 students.

• Continuous Random Variable-A continuous random variable is one which takes an infinite
number of possible values.
• Example: Height, weight, the amount of sugar in an orange, Amount of caffeine in Coke, the
time required to run a mile.
Binomial Distribution
• The binomial probability distribution is the theoretical probability distribution of all
numbers of possible successes over a certain number of Bernoulli trials.
• A binomial experiment is a type of simple random experiment where only two mutually
exclusive outcomes are possible on any trial and those two outcomes are a success and
failure.
• Such trials where only one of two mutually exclusive outcomes is possible are Bernoulli
trials
• For example, flipping a coin is a Bernoulli trial, because only heads and tails are
possible. Heads could be defined as a “success” and tails could be defined as a
“failure.”
• A person with cancer who is taking a new experimental type of chemotherapy is a
Bernoulli trial, where the patient being cured is a “success” and the patient not
being cured is a “failure.”
• The binomial probability is the probability of observing a certain number of successes (r)
over a certain number of independent Bernoulli trials.

• where n! = n*(n-1)*(n-2)*(n-3)....1
Binomial Distribution: Probability Distribution of discrete variable
• Calculate probability of getting heads on tossing three coins together.
• The three coins can land in eight possible ways:
HHH, HHT, HTT, HTH, THH, THT, TTH, TTT
• Sample space= {0, 1, 2, 3}
• Total outcomes= 8

How many heads when we toss 3 coins?

P(X = 0) = 1/8 (TTT)


P(X = 1) = 3/8 (TTH, THT, HTT)
P(X = 2) = 3/8 (THH, HHT, HTH)
P(X = 3) = 1/8 (HHH)
Normal Distribution
• Normal distribution is symmetric about its mean and extends infinitely
on both sides.
• Probability density is higher close to the mean and decreases
exponentially as we move further away from the mean.
• There is a high probability that the value of the random variable is close
to the mean. As we move further away from the mean, the probability
of the occurrence of such values decreases.
Normal Probabilities- Conversion of normal distribution to standard normal distribution

• Suppose X is normal with mean 8.0 and standard deviation 5.0. Find P(X < 8.6)

X − μ 8.6 − 8.0
Z= = = 0.12
σ 5.0

μ=8 μ=0
σ = 10 σ=1

8 8.6 X 0 0.12 Z

P(X < 8.6) P(Z < 0.12)


1/22/2022 35
Solution: Finding P(Z < 0.12)

Standardized Normal Probability


Table (Portion) P(X < 8.6)
= P(Z < 0.12)
Z .00 .01 .02 .5478
0.0 .5000 .5040 .5080

0.1 .5398 .5438 .5478


0.2 .5793 .5832 .5871
Z
0.00
0.3 .6179 .6217 .6255
0.12
1/22/2022 36
Sample vs Population
Population Sample
The measurable quality is The measurable quality is
called a parameter called statistics

The population is The sample is a subset of


complete set the population
Reports are a true Reports have a margin of
representation of opinion error and confidence
interval
It contains all members of It is a subset that
a specified group represents the entire
population
Central Limit Theorem

• Central limit theorem states that if you take sufficiently large random samples
(sample size ‘n’) from any population distribution with a mean μ and standard
deviation σ, the distribution of sample means (or the ‘sampling distribution of
sample means’) will be a normal distribution with a mean µ and standard deviation
σ/√n.
• Central limit theorem is applicable for a sufficiently large sample sizes (n≥30).
• Standard deviation of the sample means distribution is also referred to as the
‘standard error of the mean’, or simply the ‘standard error’, and is denoted by ‘SE’.
• Sample standard deviation (n>30) = σ/√n.
https://www.youtube.com/watch?v=b5xQmk9veZ4
Summary
1. Descriptive-graphs, mean, median, mode, sd, variance, inter quartile range (M1)

a. Measures of central tendency- centre


b. Measures of dispersion-spread of data
point
-Standard deviation =stdev
- Mean : Average = sum of all numbers/
Sd = Sq root (sum (xi-xbar)^2/n)
total number
-Variance
-Median : Positional average
Variance= Square of Standard deviation
*Useful when data has outliers
-Interquartile ranges
*Arrange data is ascending order
=Q3-Q1
* pick central value
-Mode : Most repeated number/ number
with highest frequency

2. Inferential - Hypothesis testing (M2, M3)


Probability calculation under binomial distribution
• Manual method

• Excel
• Binom.dist (x,n,p,cumulative)
Probability calculation under normal distribution
• Manual method:
• Convert data to z score
• Calculate probability using
• p value calculator (http://courses.atlas.illinois.edu/spring2016/STAT/STAT200/pnormal.html)
• standard z score table (https://www.math.arizona.edu/~rsims/ma464/standardnormaltable.pdf)

• Excel formula:
• NORM.DIST(x,mean,standard_dev,cumulative)
• X-The value for which you want the distribution.
• Mean-The arithmetic mean of the distribution.
• Standard_dev- The standard deviation of the distribution.
• Cumulative - True

• NORM.S.DIST(z,cumulative)
• Z- The value for which you want the distribution.
• Cumulative - True
Practice
• A radar unit is used to measure speeds of cars on a motorway. The speeds are normally
distributed with a mean of 90 km/hr and a standard deviation of 10 km/hr. What is the
probability that a car picked at random is travelling at more than 100 km/hr?
• The probability that a car selected at a random has a speed greater than 100 km/hr is equal
to 0.1587

• For a certain type of computers, the length of time between charges of the battery is
normally distributed with a mean of 50 hours and a standard deviation of 15 hours. John
owns one of these computers and wants to know the probability that the length of time will
be between 50 and 70 hours.

• The probability that John's computer has a length of time between 50 and 70 hours is equal
to 0.4082
Practice
• The time taken to assemble a car in a certain plant is a random variable having a normal distribution of
20 hours and a standard deviation of 2 hours. What is the probability that a car can be assembled at
this plant in a period of time
a) less than 19.5 hours?
b) between 20 and 22 hours?

• a) P(x < 19.5) = P(z < -0.25)


= 0.4013
b) P(20 < x < 22) = P(0 < z < 1)
= 0.3413
Practice
• Entry to a certain University is determined by a national test. The scores on this test are normally
distributed with a mean of 500 and a standard deviation of 100. Tom wants to be admitted to this
university and he knows that he must score better than at least 70% of the students who took the test.
Tom takes the test and scores 585. Will he be admitted to this university?
• Tom scored better than 80.23% of the students who took the test and he will be admitted to this
University.

• The length of similar components produced by a company are approximated by a normal distribution
model with a mean of 5 cm and a standard deviation of 0.02 cm. If a component is chosen at random
a) what is the probability that the length of this component is between 4.98 and 5.02 cm?
b) what is the probability that the length of this component is between 4.96 and 5.04 cm?
• a) P(4.98 < x < 5.02) = P(-1 < z < 1)
= 0.6826
b) P(4.96 < x < 5.04) = P(-2 < z < 2)
= 0.9544
Practice
• The length of life of an instrument produced by a machine has a normal distribution with a mean of 12
months and standard deviation of 2 months. Find the probability that an instrument produced by this
machine will last
a) less than 7 months.
b) between 7 and 12 months.

• a) P(x < 7) = P(z < -2.5)


= 0.0062
b) P(7 < x < 12) = P(-2.5 < z < 0)
= 0.4938
Practice
• The annual salaries of employees in a large company are approximately normally distributed with a
mean of $50,000 and a standard deviation of $20,000.
a) What percent of people earn less than $40,000?
b) What percent of people earn between $45,000 and $65,000?
c) What percent of people earn more than $70,000?

• a) For x = 40000, z = -0.5


Area to the left (less than) of z = -0.5 is equal to 0.3085 = 30.85% earn less than $40,000.
b) For x = 45000 , z = -0.25 and for x = 65000, z = 0.75
Area between z = -0.25 and z = 0.75 is equal to 0.3720 = 37.20 earn between $45,000 and $65,000.
c)For x = 70000, z = 1
Area to the right (higher) of z = 1 is equal to 0.1586 = 15.86% earn more than $70,000.
Practice data set
• https://docs.google.com/spreadsheets/d/1i_530f9cgakQJo5_D6Aph9W
srjnm2rm9j5wmWEXYkf0/edit#gid=1728205802
• https://drive.google.com/drive/folders/19m6OlEgGsOPCrVNC4MnAuU
FF61roDUhB?usp=sharing

• https://iterationinsights.com/article/where-to-start-with-the-4-types-
of-analytics/
• https://blog.masterofproject.com/project-integration-management-
overview/
• https://studiousguy.com/real-life-examples-normal-distribution/
Doubts?
All the Best!

https://www.youtube.com/watch?v=Z9Gw9dIJGiA&t=86s&ab_channel=upGrad_Gmba

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy