0% found this document useful (0 votes)
72 views12 pages

Descriptive Probability

This document discusses key concepts in probability and statistics including variables and units of measurement, measures of central tendency and dispersion, probability theory, and common probability distributions like binomial, Poisson, and normal distributions. Formulas and R code are provided for calculating statistics and probability values.

Uploaded by

Parul Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views12 pages

Descriptive Probability

This document discusses key concepts in probability and statistics including variables and units of measurement, measures of central tendency and dispersion, probability theory, and common probability distributions like binomial, Poisson, and normal distributions. Formulas and R code are provided for calculating statistics and probability values.

Uploaded by

Parul Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 12

• Variables, Units of Measurement and Frequency

• Measures of Central Tendency

• Measures of Dispersion

• Probability Theory

• Binomial Distribution

• Poisson Distribution

• Normal Distribution
Units of Measurement for Variables

Variables

Categorical / Qualitative Continuous / Quantitative

Nominal Ordinal Interval Ratio


Frequency

• Frequency: Number of times values of the variable repeats itself

• Frequency Distribution: Statistical table which shows the corresponding frequencies against the values of the variables

Simple Frequency Distribution Grouped Frequency Distribution


Variable (x) Frequency (f) Variable (x) Frequency (f)  Class
2 8 2-5 8  Class Frequency
4 10 5-7 10  Class Width

7-9 15  Class Limit


7 15
 Class Boundary
 Relative Frequency
Measures of Central Tendency
• Arithmetic Mean: Sum of a collection of numbers divided by the count of numbers in the collection

 Simple A.M:
Weighted A.M :

• Mode: It is that value of the variable which has the highest frequency

 Simple frequency distribution

 Grouped frequency distribution : Mode = l1 + c ( )

• Median: It is the central most value of the variable and divides the dataset into two equal halves

 Simple series

 Simple frequency distribution

 Grouped frequency distribution: Median = l1 + ( - ∑ f1 )


Measures of Dispersion
• Range: Difference between maximum value of the variable and minimum value of the variable from the dataset

• Standard Deviation: “ Root – Mean – Square – Deviation – from Mean “

 Simple A.M:

 Weighted A.M :

• Quartile Deviation: Divides the dataset into four equal parts and so we have Q1, Q2 and Q3. Way to estimate the spread of the
distribution w.r.t the central measure.

• Inter-Quartile Range: Range between the Quartiles Q1 and Q3 and is used to measure outliers (Box Plot)

• Coefficient of Variance:
R codes
• Creating a Vector:
> x <- c(2,4,7,8,10) # Quantitative values #
> y <- c( “Yes”, “No” ) # Qualitative values #

• Creating a dataframe:
> df <- data.frame (x, y)

• Creating Frequency Table:


> t <- table (data.frame name $ variable name) # with one variable #
> t1 <- table (data.frame $ var 1, dataframe $ var 2 ) # with more than one variable #

• Creating Groups or Cut points:


> cutvariable <- cut(variable name, breaks = c( 10,20,30,40), labels = c(“A”, “B”, “C”,) # e.g. 20 will fall in 10-20 #
> cutvariable <- cut(variable name, breaks = c( 10,20,30,40), labels = c(“A”, “B”, “C”), right = FALSE) # 20 will fall in 20-30 #

• Creating Charts:
> barplot( t, main = “title”, xlab = “x”, ylab = “y”, legend = row.names(t), col = rainbow (specify no.))
> pie (t)
> hist(t)
> boxplot (dataframe $ variable name)
R codes

• Measures of Central Tendency:


> mean(dataframe $ variable name)
> median (dataframe $ variable name)
> t <- table (dataframe name $ variable name)
> t[t = = max (t)] # Gives the Modal value; which needs to be calculated from the frequency table #

• Measures of Dispersion:
>sd (dataframe $ variable name)
>range (dataframe $ variable name)
>quantile (dataframe $ variable name) # Gives all four quartile values #
Probability

• Important concepts:

 Trial : An experiment which can be conducted repeatedly
 Event: The outcome of an experiment
 Mutually Exclusive: Events cannot occur simultaneously
 Exhaustive: At least one event has to occur after every experiment
 Equally Likely: Every event has same chance of occurrence
 Union (U): Events A union B means, A or B = A + B = A U B
 Intersection (Ω): Events A intersection B means, A and B = A * B = A Ω B
 Complement: Á means wherever event A is not present

• Classical definition:
If there are N mutually exclusive, exhaustive and equally likely events; and if N(A) of them are favorable to event A, then:

P(A) =
Probability
• Properties:

 Values of probability lies between 0 and 1

 The sum of all the events present in the sample space = 1

 Á=1–A

 Addition Rule : A or B = A + B = A U B

a. Mutually Exclusive events: P(AUB) = P(A) + P (B)


b. Not Mutually Exclusive events: P(AUB) = P(A) + P (B) – P(A Ω B)

 Multiplication Rule: A and B = A * B = A Ω B

a. Independent Events: P(A Ω B) = P(A) * P(B)


b. Conditional Events: P(A Ω B) = P(B) * P(A/B)

 Thomas Bayes Theorem: If event A can occur with any N mutually exclusive, exhaustive and equally likely events and if A actually occurs with Ei
P(Ei / A) =
Binomial Distribution

• Properties: It is a Discrete Probability distribution
(Used when there are repeated trials)

 Every trial has a success or a failure pmf: f(x) = nCx. θx .(1-θ)n-x


 Every trial is independent to each other
 Probability of success is same for every trial

Poisson Distribution

• Properties: It is a Discrete Probability distribution


(Used when trials becomes huge and tends towards infinity)

 Limiting form of Binomial distribution pmf: f(x) =


 The average occurrence of the event is known
 No. of trials is generally very large and so is unknown
Normal Distribution

• Properties of Normal Distribution: Continuous Probability Distribution



 Symmetrical curve with Skewness = 0
 Infinite Limits tending from - to +
 Mean = Median = Mode

• Standard Normal (Z) Distribution: Continuous Probability Distribution

 Symmetrical curve with Skewness = 0 Standard Normal Density Function

 Finite Limits tending from - 3 to + 3 f(z) =


 Mean = Median = Mode at z= 0
R codes

• Binomial Distribution
> dbinom(12:24, size = n, prob = θ)
> sum (dbinom(12:24, size = n, prob = θ))

• Poisson Distribution
> dpois (x = 112: 115, lambda = value)
> sum (dpois (x = 112: 115, lambda = value))

• Normal Distribution:
> X <-pnorm(5000, mean = value, sd = value)
>Y <- pnorm(10000, mean = value, sd = value)
> Y – X # Probability between 10000 and 5000 #

* Values of x has been assumed to make it understandable

* By default it calculates values of the lower tail…so we add : lower.tail =FALSE

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy