Introduction To Analytics
Introduction To Analytics
Decision Sciences
Introduction to analytics
Data
Information
Knowledge
Wisdom
Analysing Data
Statistics
• The practice or science of collecting and analysing numerical data in large quantities,
especially for the purpose of inferring proportions in a whole from those in a representative
sample.
Statistics
Measure of Dispersion
Standard deviation, IQR, Variance
Mean
• Positional average
• Suitable in case of outlier
• median is the value separating the higher half from the lower half of a data
sample
• Steps
• Arrange the sample data in ascending order of frequency, from left to right,
the value in the middle is called the median.
• For an odd number of values, we have one central value.
• For an even number of values, the median is the average of the two central
values.
Median
Data:
a. N is odd
4,7,9,10,12
Median= 3rd observation =9
b. N is even
4,7,9,10
Median=Average of 2nd and 3rd observation =(7+9)/2=8
Mode
Mean (Chocolate)
= sum/n
=(4+7+9+10+12)/5= 42/5= 8.4
• 120,112,174, 134,126,121,344
• Mean= (120+112+174+134+126+121+344)=1131/7=161.57
• Median
• Arrange the sample data in ascending order of frequency
• 112,120,121,126,134, 174,344
• Find mid value
SKU004
Customer feedback at Decathlon
Standard Deviation
• Shows variation about the mean
• Most commonly used measure of variation
• It serves the purpose of measuring variation without exaggerating its
magnitude.
• It is popularly represented as 𝜎.
Variance
• Variance is defined as the mean of the square of the difference between data
points and the mean value of all data points within a dataset.
• Variance is a measure of variability that utilizes all the data.
• Variance is square of standard deviation
Interquartile range
p+q=1
Types of Probability
Probability
Classical or Empirical or
Subjective
Theoretical Frequentist
Approach
Approach Approach
Classical or Theoretical Approach
• P(H)=5052/10,000
• Following the frequentist approach, you would conclude that the probability of getting heads is
0.5052 and that of getting tails is 0.4948 for that particular coin.
Example
• Suppose an insurance company knows from past actuarial data that of all males 40
years old, about 60 out of every 100,000 will die within a 1 –year period. Using this
method, the estimate the probability of death for that age group.
p=60/100000 = 0.00006
Subjective Probability
Discrete Continuous
(Binomial Distribution) (Normal Distribution)
Random Variable
• Discrete random variable- A discrete random variable is one which may take on only a
countable number of distinct values
• Examples: No of customer, roll of die, no of students in class, number of children in a family,
Number of people watching movie in a theatre, the number of patients in a doctor's
surgery, the number of defective light bulbs in a box of ten.
• For example, the number of students in a class. A class can have 10 students or 11 students,
but it cannot have 10.25 students.
• Continuous Random Variable-A continuous random variable is one which takes an infinite
number of possible values.
• Example: Height, weight, the amount of sugar in an orange, Amount of caffeine in Coke, the
time required to run a mile.
Binomial Distribution
• The binomial probability distribution is the theoretical probability distribution of all
numbers of possible successes over a certain number of Bernoulli trials.
• A binomial experiment is a type of simple random experiment where only two mutually
exclusive outcomes are possible on any trial and those two outcomes are a success and
failure.
• Such trials where only one of two mutually exclusive outcomes is possible are Bernoulli
trials
• For example, flipping a coin is a Bernoulli trial, because only heads and tails are
possible. Heads could be defined as a “success” and tails could be defined as a
“failure.”
• A person with cancer who is taking a new experimental type of chemotherapy is a
Bernoulli trial, where the patient being cured is a “success” and the patient not
being cured is a “failure.”
• The binomial probability is the probability of observing a certain number of successes (r)
over a certain number of independent Bernoulli trials.
• where n! = n*(n-1)*(n-2)*(n-3)....1
Binomial Distribution: Probability Distribution of discrete variable
• Calculate probability of getting heads on tossing three coins together.
• The three coins can land in eight possible ways:
HHH, HHT, HTT, HTH, THH, THT, TTH, TTT
• Sample space= {0, 1, 2, 3}
• Total outcomes= 8
• Suppose X is normal with mean 8.0 and standard deviation 5.0. Find P(X < 8.6)
X − μ 8.6 − 8.0
Z= = = 0.12
σ 5.0
μ=8 μ=0
σ = 10 σ=1
8 8.6 X 0 0.12 Z
• Central limit theorem states that if you take sufficiently large random samples
(sample size ‘n’) from any population distribution with a mean μ and standard
deviation σ, the distribution of sample means (or the ‘sampling distribution of
sample means’) will be a normal distribution with a mean µ and standard deviation
σ/√n.
• Central limit theorem is applicable for a sufficiently large sample sizes (n≥30).
• Standard deviation of the sample means distribution is also referred to as the
‘standard error of the mean’, or simply the ‘standard error’, and is denoted by ‘SE’.
• Sample standard deviation (n>30) = σ/√n.
https://www.youtube.com/watch?v=b5xQmk9veZ4
Summary
1. Descriptive-graphs, mean, median, mode, sd, variance, inter quartile range (M1)
• Excel
• Binom.dist (x,n,p,cumulative)
Probability calculation under normal distribution
• Manual method:
• Convert data to z score
• Calculate probability using
• p value calculator (http://courses.atlas.illinois.edu/spring2016/STAT/STAT200/pnormal.html)
• standard z score table (https://www.math.arizona.edu/~rsims/ma464/standardnormaltable.pdf)
• Excel formula:
• NORM.DIST(x,mean,standard_dev,cumulative)
• X-The value for which you want the distribution.
• Mean-The arithmetic mean of the distribution.
• Standard_dev- The standard deviation of the distribution.
• Cumulative - True
• NORM.S.DIST(z,cumulative)
• Z- The value for which you want the distribution.
• Cumulative - True
Practice
• A radar unit is used to measure speeds of cars on a motorway. The speeds are normally
distributed with a mean of 90 km/hr and a standard deviation of 10 km/hr. What is the
probability that a car picked at random is travelling at more than 100 km/hr?
• The probability that a car selected at a random has a speed greater than 100 km/hr is equal
to 0.1587
• For a certain type of computers, the length of time between charges of the battery is
normally distributed with a mean of 50 hours and a standard deviation of 15 hours. John
owns one of these computers and wants to know the probability that the length of time will
be between 50 and 70 hours.
• The probability that John's computer has a length of time between 50 and 70 hours is equal
to 0.4082
Practice
• The time taken to assemble a car in a certain plant is a random variable having a normal distribution of
20 hours and a standard deviation of 2 hours. What is the probability that a car can be assembled at
this plant in a period of time
a) less than 19.5 hours?
b) between 20 and 22 hours?
• The length of similar components produced by a company are approximated by a normal distribution
model with a mean of 5 cm and a standard deviation of 0.02 cm. If a component is chosen at random
a) what is the probability that the length of this component is between 4.98 and 5.02 cm?
b) what is the probability that the length of this component is between 4.96 and 5.04 cm?
• a) P(4.98 < x < 5.02) = P(-1 < z < 1)
= 0.6826
b) P(4.96 < x < 5.04) = P(-2 < z < 2)
= 0.9544
Practice
• The length of life of an instrument produced by a machine has a normal distribution with a mean of 12
months and standard deviation of 2 months. Find the probability that an instrument produced by this
machine will last
a) less than 7 months.
b) between 7 and 12 months.
• https://iterationinsights.com/article/where-to-start-with-the-4-types-
of-analytics/
• https://blog.masterofproject.com/project-integration-management-
overview/
• https://studiousguy.com/real-life-examples-normal-distribution/
Doubts?
All the Best!
https://www.youtube.com/watch?v=Z9Gw9dIJGiA&t=86s&ab_channel=upGrad_Gmba